Simultaneous requests with turbogears2 - python

I'm very new to web dev, and i'm trying to build a simple Web interface with Ajax calls to refresh data, and turbogears2 as the backend.
My Ajax calls are working fine and makes periodic calls to my Turbogears2 server, however these calls takes time to complete (some requests make the server to use remote SSH calls on other machines, which takes up to 3-4 seconds to complete).
My problem is that TurboGears waits for each request to complete before handling the next one, so all my concurrent Ajax calls are being queued instead of being all processed in parallel.
To refresh N values takes 3*N seconds where it could just take 3 seconds with concurrency.
Any idea how to fix this ?
Here is my current server-side code (method get_load is the one called with Ajax):
class RootController(TGController):
#expose()
def index(self):
with open ("index.html") as data:
index = data.read()
return index
#expose()
def get_load(self, ip):
command = "bash get_cpu_load.sh"
request = subprocess.Popen(["ssh", "-o ConnectTimeout=2", ip, command])
load = str(request.communicate()[0])
return load

Your problem is probably caused by the fact that you are serving requests with Gearbox wsgiref server. By default the wsgiref server is single threaded and so can serve a single request at time. That can be changed by providing the wsgiref.threaded = true configuration option in your development.ini server section (the same where ip address and port are specified too). See https://github.com/TurboGears/gearbox#gearbox-http-servers and http://turbogears.readthedocs.io/en/latest/turbogears/gearbox.html#changing-http-server for additional details.
Note that wsgiref is the development server for TurboGears and usage on production is usually discouraged. You should consider using something like waitress, chaussette or mod_wsgi when deploying your application, see http://turbogears.readthedocs.io/en/latest/cookbook/deploy/index.html?highlight=deploy

Related

How to prevent the 230 seconds azure gateway timeout using python flask for long running work loads

I have a python flask application as a azure web app and one function is a compute intensive workload which takes more than 5 minutes to process, is there any hack to prevent the gateway time out error by keeping the TCP connection active between the client and the api while the function is processing the data? Sample of current code below.
from flask import Flask
app = Flask(__name__)
#app.route('/data')
def data():
mydata = super_long_process_function()
# takes more than 5 minutes to process
return mydata
Since the super_long_process_function takes more than 5 minutes, it always times out with 504 Gateway Time-out. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. So is there any hack in flask that can be used to prevent this timeout while we process the data because based on my research and reading Microsoft documentation the 230 seconds limit cannot be changed for web apps.
In short: the 230 second timeout, as you stated, cannot be changed.
230 seconds is the maximum amount of time that a request can take without sending any data back to the response. It is not configurable.
Source: GitHub issue
The timeout occurs of there's no response. Keeping the connection open and sending data will not help.
There are a couple of ways you can go about this. Here are two of more possible solutions you could use to trigger your long running tasks without the timeout being an issue.
Only trigger the long running task with an HTTP call, but don't wait for their completion before returning a response.
Trigger the task using a messaging mechanism like Storage Queues or Service Bus.
For updating the web application with the result of the long running task, think along the lines of having the response hold a URL the frontend can call to check for task completion periodically, your request having a callback URL to call when the task has completed or implementing Azure Web PubSub for sending status updates to the client.

How to adjust timeout for python app served with waitress

I am running a flask application which has an option in the UI where the user can click a button and it calls an endpoint to perform some analysis.
The application is served as follows:
from waitress import serve
serve(app, host="0.0.0.0", port=5000)
After around ~1 minute, I am receiving a gateway timeout in the UI:
504 Gatway Time-out
However, the flask application keeps doing the work behind and after 2 minutes it completes the processing and I can see that it submits the data on the db side. So the process is not timing out itself.
I tried already passing channel_timeout argument to a much higher value (default seems 120 seconds) but with no luck. I know that this makes sense to implement in a different way where the user doesn't have to wait for these two minutes, but I am looking if there is such a timeout set by default and whether it can be increased.
The application is deployed in K8s and the UI is exposed via ingress. Could the timeout come from ingress instead?
The problem was with the ingress controller default timeout.
I managed to get around this issue by changing the implementation and having this job run as background task instead.

Bokeh Session and Document Polling

I am trying to serve bokeh documents via Django using the bokeh-server executable, which creates a Tornado instance. The bokeh documents can be accessed via URL provided by the Session.object_link method. When navigated to, the bokeh-server executable writes this to the stdout (IP addresses have been replaced with ellipses):
INFO:tornado.access:200 POST /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/gc (...) 222.55ms
INFO:tornado.access:200 GET /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/ (...) 110.15ms
INFO:tornado.access:200 POST /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/gc (...) 232.66ms
INFO:tornado.access:200 GET /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/ (...) 114.16ms
This appears to be communication between the python instance running the Django WSGI app (initialized by Apache running mod_wsgi) and the bokeh-server executable.
When the browser is sent the response, including the graphs and data etc. required for the bokeh interface, there is some initial networking to the browser, followed by networking if there is any interaction with the graphs which have python callbacks. When the user closes the window or browser, the same networking above continues. Moreover, the networking only stops when the Django or bokeh-server processes are killed.
In order to start a bokeh session and pass a URL back to the Django template, it is necessary to start the bokeh session in a new thread:
def get_bokeh_url(self, context):
t = Thread(target=self.run)
t.start()
return self.session.object_link(self.document.context)
def run(self):
return self.session.poll_document(self.document)
self.session and self.document were both initialized before the thread was started. So at the point where get_bokeh_url is called, there are some graphs on the document, some of which have interaction callbacks and session has been created but not polled via poll_document (which appears necessary for interaction).
The thread keeps running forever unless you kill either Django or bokeh-server. This means that when more requests come through, more threads build up and the amount of networking increases.
My question is, is there a way to kill the thread once the document is no longer being viewed in a browser?
One answer that I have been pondering would be to send a quick request to the server when the browser closes and somehow kill the thread for that document. I've tried deleting the documents from the bokeh interface, but this has no effect.
The bokeh server periodically checks whether there are connections to a session. If there have been no connections for some time, the session is expired and destroyed.
As of version 0.12.1, the check interval and maximum connectionless time default to 17 and 60 seconds, respectively. You can override them by running the server like this
bokeh serve --check-unused-sessions 1000 --unused-session-lifetime 1000 app.py
This is rather hard to find in the docs, it's described in the CLI documentation and in the developer guide, in a section on Applications, Sessions and Connections in the Server Architecture chapter. There's also a closed Github issue on this topic: Periodic callbacks continue after tabs are closed #3770
If you need custom logic whenever a session is destroyed, use the directory deploy format for your app and add a server_lifecycle.py file containing your Lifecycle Hooks, specifically this one:
def on_session_destroyed(session_context):
''' If present, this function is called when a session is closed. '''
pass

Share a background process in django?

My django app talks to a SOAP service using the suds-jurko library
from suds.client import Client
try:
URL = "http://192.168.12.11/xdwq/some_service.asmx?WSDL"
client = Client(URL, timeout=30)
except:
# Fallback mode
pass
def get_data(ID):
try:
response = client.service.GetData(ID)
data = response.diffgram.NewDataSet.master
return data
except:
return None
In my views
data = get_data(ID)
The problem is that the service takes quite some time to initialize (~20 seconds). Subsequent requests take upto 3 seconds to return. Whenever the page is requested the webserver (apache with mod_wsgi) takes quite a while to load on some requests.
In my apache configuration
WSGIDaemonProcess www.example.com user=hyde group=hyde threads=15 maximum-requests=10000
How do I write my code, so that apache (or django) can share a single background process for the SOAP service and minimize the 30s penalty?
I have been reading about celery and other such methods but am unsure how to proceed. Please advise.
You must create separate background process, using pure python or some third party modules (for example celery, as mentioned) and communicate with that process from django views (using unix or tcp sockets for example).
Also instead of WSGI, you can use different method to serve django application (gunicorn, uwsgi) that will persist in memory, but this is really dirty solution and I don't recommend that.

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?
The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.
reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.
You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

Categories