Better ways to handle AppEngine requests that time out? - python

Sometimes, with requests that do a lot, Google AppEngine returns an error. I have been handling this by some trickery: memcaching intermediate processed data and just requesting the page again. This often works because the memcached data does not have to be recalculated and the request finishes in time.
However... this hack requires seeing an error, going back, and clicking again. Obviously less than ideal.
Any suggestions?
inb4: "optimize your process better", "split your page into sub-processes", and "use taskqueue".
Thanks for any thoughts.
Edit - To clarify:
Long wait for requests is ok because the function is administrative. I'm basically looking to run a data-mining function. I'm searching over my datastore and modifying a bunch of objects. I think the correct answer is that AppEngine may not be the right tool for this. I should be exporting the data to a computer where I can run functions like this on my own. It seems AppEngine is really intended for serving with lighter processing demands. Maybe the quota/pricing model should offer the option to increase processing timeouts and charge extra.

If interactive user requests are hitting the 30 second deadline, you have bigger problems: your user has almost certainly given up and left anyway.
What you can do depends on what your code is doing. There's a lot to be optimized by batching datastore operations, or reducing them by changing how you model your data; you can offload work to the Task Queue; for URLFetches, you can execute them in parallel. Tell us more about what you're doing and we may be able to provide more concrete suggestions.

I have been handling something similar by building a custom automatic retry dispatcher on the client. Whenever an ajax call to the server fails, the client will retry it.
This works very well if your page is ajaxy. If your app spits entire HTML pages then you can use a two pass process: first send an empty page containing only an ajax request. Then, when AppEngine receives that ajax request, it outputs the same HTML you had before. If the ajax call succeeds it fills the DOM with the result. If it fails, it retries once.

Related

Can Flask or Django handle concurrent tasks?

What I'm trying to accomplish:
I have a sensor that is constantly reading in data. I need to print this data to a UI whenever data appears. While the aforementioned task is taking place, the user should be able to write data to the sensor. Ideally, both these tasks would / could happen at the same time. Currently, I have the program written using flask; but if django would be better suited (or a third party) I would be willing to make the switch. Note: this website will never be deployed so no need to worry about that. Only user will be me, running program from my laptop.
I have spent a lot of time researching flask async functions and coroutines; however I have not seen any clear indications if something like this would be possible.
Not looking for a line by line solution. Rather, a way (async, threading etc) to set up the code such that the aforementioned tasks are possible. All help is appreciated, thanks.
I'm a Django guy, so I'll throw out what I think could be possible
Django has a decorator #start_new_thread which can be put on any function and it will run in a thread.
You could make a view, POST to it with Javascript/Ajax and start a thread for communication with the sensor using the data POSTed.
You could also make a threading function that will read from the sensor
Could be a management command or a 'start' btn that POSTs to a view that then starts the thread
Note: You need to do Locks or some other logic so the two threads don't conflict when reading/writing
Maybe it's a single thread that reads/writes to the sensor and each loop it checks if there's anything to write (existence + contents of a file? Maybe db entry?
Per the UI, lets say a webpage. You're best best would be Websockets, but because you're the only one that will ever use it you could just write up some Javascript/Ajax that would Ping a view every x seconds and display the new data on the webpage
Note: that's essentially what websockets do, ping every x seconds
Now the common thread is Javascript/Ajax, this is so the page doesn't need to refresh and you can constantly see the data coming in without the page being refreshed.
You can probably do all of this in Flask if you find a similar threading ability and just add some javascript to the frontend
Hopefully you find some of this useful, and idk why stackoverflow hates these types of questions... They're literally fine

Set DNS timeout for HTTP requests using requests library

I have a function that is meant to check if a specific HTTP(S) URL is a redirect and if so return the new location (but not recursively). It uses the requests library. It looks like this:
try:
response = http_session.head(sent_url, timeout=(1, 1))
if response.is_redirect:
return response.headers["location"]
return sent_url
except requests.exceptions.Timeout:
return sent_url
Here, the URL I am checking is sent_url. For reference, this is how I create the session:
http_session = requests.Session()
http_adapter = requests.adapters.HTTPAdapter(max_retries=0)
http_session.mount("http://", http_adapter)
http_session.mount("https://", http_adapter)
However, one of the requirements of this program is that this must work for dead links. Based off of this, I set a connection timeout (and read timeout for good measures). After playing around with the values, it still takes about 5-10 seconds for the request to fail with this stacktrace no matter what value I choose. (Maybe relevant: in the browser, it gives DNS_PROBE_POSSIBLE.)
Now, my problem is: 5-10 seconds is too long to wait for if a link is dead. There are many links that this program needs to check, and I do not want a few dead links to be such a large bottleneck, hence I want to configure this DNS lookup timeout.
I found this post which seems to be relevant (OP wants to increase the timeout, I want to decrease it) however the solution does not seem applicable. I do not know the IP addresses that these URLs point to. In addition, this feature request from years ago seems relevant, but it did not help me further.
So far, the best solution to me seems to just spin up a coroutine for each link/a batch of links and then suck up the timeout asynchronously.
I am on Windows 10, however this code will be deployed on an Ubuntu server. Both use Python 3.8.
So, how can I best give my HTTP requests a very low DNS resolution timeout in the case that it is being fed a dead link?
So, how can I best give my HTTP requests a very low DNS resolution timeout in the case that it is being fed a dead link?
Separate things.
Use urllib.parse to extract the hostname from the URL, and then use dnspython to resolve that name, with whatever timeout you want.
Then, and only if the resolution was correct, fire up requests to grab the HTTP data.
#blurfus: in requests you can only use the timeout parameter in the HTTP call, you can't attach it to a session. It is not spelled out explicitly in the documentation, but the code is quite clear on that.
There are many links that this program needs to check,
That is a completely separate problem in fact, and exists even if all links are ok, it is just a problem of volume.
The typical solutions fell in two cases:
use asynchronous libraries (they exist for both DNS and HTTP), where your calls are not blocking, you get the data later, so you are able to do something else
use multiprocessing or multithreading to parallelize things and have multiple URLs being tested at the same time by separate instances of your code.
They are not completely mutually exclusive, you can find a lot of pros and cons for each, asynchronous codes may be more complicated to write and understand later, so multiprocessing/multithreading is often the first step for a "quick win" (especially if you do not need to share anything between the processes/threads, otherwise it becomes quickly a problme), yet asynchronous handling of everything makes the code scales more nicely with the volume.

Django - Best way to return in-process text from server to JQuery?

I'm building what might end up being a web-app / game for a fun side-project. In the app the user has the ability to kick off a server-side function that will do a few processes on the back end. At times this could take a few minutes, so I'm try to work out how to return text from Django to the front end without doing a full 'return' from backend to front. Is there a good way of doing this? I experimented with StreamingHttpResponse, but couldn't figure it out, and am not even sure if it's the right solution. Below is a sample for what my flow looks like.
Current state
User hits button on website and there is an Ajax post call to a
Django function
This Django function calls ~4 sub-functions, each doing certain
processing
After the completion of all sub-functions (~2 minutes of a static
front-end), Django returns HTTPResponse of success and triggers a
refresh of the front end page
Desired State
User hits button on website and there is an Ajax post call to a Django function
This Django function calls ~4 sub-functions, each doing certain processing
After the completion of sub-function 1 (~30 seconds), a message is returned to the front-end saying 'X process done, 25% complete for this period'
After the completion of sub-function 2 (~30 seconds), a message is returned to the front-end saying 'Y process done, 50% complete for this period'... etc for other functions
After the completion of all sub-functions, Django returns HTTPResponse of success and triggers a refresh of the front end page
Thanks!
Actually, you are trying to make a full-duplex client-server communication. There are many strategies to make that possible :
WebSockets, chat protocols, IoT protocols, and more.
In Django, there is a nice package called channels to implement that : https://channels.readthedocs.io/en/latest/
Answer Lazy URI ressources
A strategy may be to immediately answer a
HTTPResponse with a URI that will serve the resource when it will be available (with a security token). And you can expose another endpoint the Front End can call whenever it wants to know the processing progress, giving its token.
Asynchronous workloads
What I understand is that you made the processing synchronously, blocking the HTTP answer during all the processing tasks. Actually, with timeout policies, it could be problematic.
And finally, if you think to scale your app, you would probably like to split purely answering HTTP requests, and long-processing workloads.
For that, you may need asynchronous processing, for which celery is python natural candidate.

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.
Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)
Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

Flask and long running tasks

I'm writing a small web server using Flask that needs to do the following things:
On the first request, serve the basic page and kick off a long (15-60 second) data processing task. The data processing task queries a second server which I do not control, updates a local database, and then performs some calculations on the results to show in the web page.
The page issues several AJAX requests that all depend on parts of the result from the long task, so I need to wait until the processing is done.
Subsequent requests for the first page would ideally re-use the previous request's result if they come in while the processing task is ongoing (or even shortly thereafter)
I tried using flask-cache (specifically SimpleCache), but ran into an issue as it seems the cache pickles the result, when I'd really rather keep the exact object.
I suppose I could re-write what I'm caching to be pickle-able, and then implement a single worker thread to do the processing.
Is there some more better way of handling this kind of workflow?
I think best way for long data processing is something like Celery.
Send request to run task and receive task ID.
Periodically send ajax requests to check task progress and receive result of task execution.

Categories