Django queue function calls - python

I have small problem with the nature of the data processing and django.
for starters. I have webpage with advanced dhtmlx table. While adding rows to table DHTMLX automatically send POST data to mine django backend where this is processed and return XML data is sent to webpage. All of it works just fine when adding 1 row at a time. But when adding several rows at a time, some problem starts to occur. For starters, I have checked the order of send data to backend and its proper (let say Rows ID 1,2,3,4 are sent in that order). Problem is that backend processes the query when it arrives, usually they arrives in the same order (even though the randomness of the Internet). But django fires the same function for them instantly and it's complex functions that takes some time to compute, then sends the response. Problem is that every time function is called there is a change in the database and one of the variables depends on how big is a database table we are altering. While having the same data table altered in wrong order (different threads speed) the result data is rubbish.
Is there any automatic solution to queue calls of one web called function so that every call could go to the queue and wait for previous to complete ??
I want to make such a queue for this function only.

It seems like you should build the queue in django. If the rows need to be processed serially on the backend, then insert the change data into a queue and process the queue like an event handler.
You could build a send queue using dhtmlx's event handlers and the ajax callback handler, yet why? The network is already slow, slowing it down further is the wrong approach.

Related

Can Flask or Django handle concurrent tasks?

What I'm trying to accomplish:
I have a sensor that is constantly reading in data. I need to print this data to a UI whenever data appears. While the aforementioned task is taking place, the user should be able to write data to the sensor. Ideally, both these tasks would / could happen at the same time. Currently, I have the program written using flask; but if django would be better suited (or a third party) I would be willing to make the switch. Note: this website will never be deployed so no need to worry about that. Only user will be me, running program from my laptop.
I have spent a lot of time researching flask async functions and coroutines; however I have not seen any clear indications if something like this would be possible.
Not looking for a line by line solution. Rather, a way (async, threading etc) to set up the code such that the aforementioned tasks are possible. All help is appreciated, thanks.
I'm a Django guy, so I'll throw out what I think could be possible
Django has a decorator #start_new_thread which can be put on any function and it will run in a thread.
You could make a view, POST to it with Javascript/Ajax and start a thread for communication with the sensor using the data POSTed.
You could also make a threading function that will read from the sensor
Could be a management command or a 'start' btn that POSTs to a view that then starts the thread
Note: You need to do Locks or some other logic so the two threads don't conflict when reading/writing
Maybe it's a single thread that reads/writes to the sensor and each loop it checks if there's anything to write (existence + contents of a file? Maybe db entry?
Per the UI, lets say a webpage. You're best best would be Websockets, but because you're the only one that will ever use it you could just write up some Javascript/Ajax that would Ping a view every x seconds and display the new data on the webpage
Note: that's essentially what websockets do, ping every x seconds
Now the common thread is Javascript/Ajax, this is so the page doesn't need to refresh and you can constantly see the data coming in without the page being refreshed.
You can probably do all of this in Flask if you find a similar threading ability and just add some javascript to the frontend
Hopefully you find some of this useful, and idk why stackoverflow hates these types of questions... They're literally fine

How Python handles asynchronous REST requests

Scenario: Lets say I have a REST API written in Python (using Flask maybe) that has a global variable stored. The API has two endpoints, one that reads the variable and returns it and the other one that writes it. Now, I have two clients that at the same time call both endpoints (one the read, one the write).
I know that in Python multiple threads will not actually run concurrently (due to the GIL), but there are some I/O operations that behave as asynchronously, would this scenario cause any conflict? And how does it behave, I'm assuming that the request that "wins the race" will hold the other request (is that right)?
In short: You should overthink your rest api design and implement some kind of fifo queue.
You have to endpoints (W for writing and R for reading). Lets say the global variable has some value V0 in the beginning. If the clients A reads from R while at the same time client B writes to W. Two things can happen.
The read request is faster. Client A will read V0.
The write request is faster. Client A will read V1.
You won't run into an inconsistent memory state due to the GIL you mentioned, but which of the cases from above happens, is completely unpredictable. One time the read request could be slightly faster and the other time the write request could be slightly faster. Much of the request handling is done in your operating system (e.g. address resolution or TCP connection management). Also the requests may traverse other machines like routers or switches in you network. All these things are completly out of your control and could delay the read request slightly more than the write request or the other way around. So it does not matter with how many threads you run your REST server, the return value is almost unpredictable.
If you really need ordered read write interaction, you can make the resource a fifo queue. So each time any client reads, it will pop the first element from the queue. Each time any client writes it will push that element to the end of the queue. If you do this, you are guaranteed to not lose any data due to overwriting and also you read the data in the same order that it is written.

Flask and long running tasks

I'm writing a small web server using Flask that needs to do the following things:
On the first request, serve the basic page and kick off a long (15-60 second) data processing task. The data processing task queries a second server which I do not control, updates a local database, and then performs some calculations on the results to show in the web page.
The page issues several AJAX requests that all depend on parts of the result from the long task, so I need to wait until the processing is done.
Subsequent requests for the first page would ideally re-use the previous request's result if they come in while the processing task is ongoing (or even shortly thereafter)
I tried using flask-cache (specifically SimpleCache), but ran into an issue as it seems the cache pickles the result, when I'd really rather keep the exact object.
I suppose I could re-write what I'm caching to be pickle-able, and then implement a single worker thread to do the processing.
Is there some more better way of handling this kind of workflow?
I think best way for long data processing is something like Celery.
Send request to run task and receive task ID.
Periodically send ajax requests to check task progress and receive result of task execution.

Storing task state between multiple django processes

I am building a logging-bridge between rabbitmq messages and Django application to store background task state in the database for further investigation/review, also to make it possible to re-publish tasks via the Django admin interface.
I guess it's nothing fancy, just a standard Producer-Consumer pattern.
Web application publishes to message queue and inserts initial task state into the database
Consumer, which is a separate python process, handles the message and updates the task state depending on task output
The problem is, some tasks are missing in the db and therefore never executed.
I suspect it's because Consumer receives the message earlier than db commit is performed.
So basically, returning from Model.save() doesn't mean the transaction has ended and the whole communication breaks.
Is there any way I could fix this? Maybe some kind of post_transaction signal I could use?
Thank you in advance.
Web application publishes to message queue and inserts initial task state into the database
Do not do this.
Web application publishes to the queue. Done. Present results via template and finish the web transaction.
A consumer fetches from the queue and does things. For example, it might append to a log to the database for presentation to the user. The consumer may also post additional status to the database as it executes things.
Indeed, many applications have multiple queues with multiple produce/consumer relationships. Each process might append things to a log.
The presentation must then summarize the log entries. Often, the last one is a sufficient summary, but sometimes you need a count or information from earlier entries.
This sounds brittle to me: You have a web app which posts to a queue and then inserts the initial state into the database. What happens if the consumer processes the message before the web app can commit the initial state?
What happens if the web app tries to insert the new state while the DB is locked by the consumer?
To fix this, the web app should add the initial state to the message and the consumer should be the only one ever writing to the DB.
[EDIT] And you might also have an issue with logging. Check that races between the web app and the consumer produce the appropriate errors in the log by putting a message to the queue without modifying the DB.
[EDIT2] Some ideas:
How about showing just the number of pending tasks? For this, the web app could write into table 1 and the consumer writes into table 2 and the admin if would show the difference.
Why can't the web app see the pending tasks which the consumer has in the queue? Maybe you should have two consumers. The first consumer just adds the task to the DB, commits and then sends a message to the second consumer with just the primary key of the new row. The admin iface could read the table while the second consumer writes to it.
Last idea: Commit the transaction before you enqueue the message. For this, you simply have to send "commit" to the database. It will feel odd (and I certainly don't recommend it for any case) but here, it might make sense to commit the new row manually (i.e. before you return to your framework which handles the normal transaction logic).

Stopping long-running requests in Pylons

I'm working on an application using Pylons and I was wondering if there was a way to make sure it doesn't spend way too much time handling one request. That is, I would like to find a way to put a timer on each request such that when too much time elapses, the request just stops (and possibly returns some kind of error).
The application is supposed to allow users to run some complex calculations but I would like to make sure that if a calculation starts taking too much time, we stop it to allow other calculations to take place.
Rather than terminate a request with an error, a better approach might be to perform long-running calculations in a separate thread (or threads) or process (or processes):
When the calculation request is received, it is added to a queue and identified with a unique id. You redirect to a results page referencing the unique ID, which can have a "Please wait, calculating" message and a refresh button (or auto-refresh via a meta tag).
The thread or process which does the calculation pops requests from the queue, updates the final result (and perhaps progress information too), which the results page handler will present to the user when refreshed.
When the calculation is complete, the returned refresh page will have no refresh button or refresh tag, but just show the final result.

Categories