Bokeh Session and Document Polling - python

I am trying to serve bokeh documents via Django using the bokeh-server executable, which creates a Tornado instance. The bokeh documents can be accessed via URL provided by the Session.object_link method. When navigated to, the bokeh-server executable writes this to the stdout (IP addresses have been replaced with ellipses):
INFO:tornado.access:200 POST /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/gc (...) 222.55ms
INFO:tornado.access:200 GET /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/ (...) 110.15ms
INFO:tornado.access:200 POST /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/gc (...) 232.66ms
INFO:tornado.access:200 GET /bokeh/bb/71cee48b-5122-4275-bd4f-d137ea1374e5/ (...) 114.16ms
This appears to be communication between the python instance running the Django WSGI app (initialized by Apache running mod_wsgi) and the bokeh-server executable.
When the browser is sent the response, including the graphs and data etc. required for the bokeh interface, there is some initial networking to the browser, followed by networking if there is any interaction with the graphs which have python callbacks. When the user closes the window or browser, the same networking above continues. Moreover, the networking only stops when the Django or bokeh-server processes are killed.
In order to start a bokeh session and pass a URL back to the Django template, it is necessary to start the bokeh session in a new thread:
def get_bokeh_url(self, context):
t = Thread(target=self.run)
t.start()
return self.session.object_link(self.document.context)
def run(self):
return self.session.poll_document(self.document)
self.session and self.document were both initialized before the thread was started. So at the point where get_bokeh_url is called, there are some graphs on the document, some of which have interaction callbacks and session has been created but not polled via poll_document (which appears necessary for interaction).
The thread keeps running forever unless you kill either Django or bokeh-server. This means that when more requests come through, more threads build up and the amount of networking increases.
My question is, is there a way to kill the thread once the document is no longer being viewed in a browser?
One answer that I have been pondering would be to send a quick request to the server when the browser closes and somehow kill the thread for that document. I've tried deleting the documents from the bokeh interface, but this has no effect.

The bokeh server periodically checks whether there are connections to a session. If there have been no connections for some time, the session is expired and destroyed.
As of version 0.12.1, the check interval and maximum connectionless time default to 17 and 60 seconds, respectively. You can override them by running the server like this
bokeh serve --check-unused-sessions 1000 --unused-session-lifetime 1000 app.py
This is rather hard to find in the docs, it's described in the CLI documentation and in the developer guide, in a section on Applications, Sessions and Connections in the Server Architecture chapter. There's also a closed Github issue on this topic: Periodic callbacks continue after tabs are closed #3770
If you need custom logic whenever a session is destroyed, use the directory deploy format for your app and add a server_lifecycle.py file containing your Lifecycle Hooks, specifically this one:
def on_session_destroyed(session_context):
''' If present, this function is called when a session is closed. '''
pass

Related

Multiprocess can not work in flask when I deploy flask to server by using flask + uwsgi + nginx

I'm using flask + uwsgi + nginx to deploy website on server.
In the flask, my code is below, here is what I want to design: every time when I click run model, it would run a model in another process, but the interface would link to the waiting interface immediately.
train_dataset.status = status
db.session.commit()
text = 'Start training your models, please wait for a while or check results several hours later'
# would run a model in another process
task = Process(target=start_train, args=(app.config['UPLOAD_FOLDER'], current_user.id, p.id, p.task), name="training.exe")
task.start()
print(task.pid, task.name)
session[f"{project_name}_train"] = task.pid
# in the meanwhile, link to waiting interface
return render_template('p_computeview.html', project_name=project_name, text=text,
status=status, p=p, email=email, not_identify=not_identify)
And when I test in local development environment
app.run()
it's ok, when I click run model, the interface just link to wait interface and I can see the model running logs.
But when I deploy to server, I chose uwsgi + nginx + flask.
In uwsgi.ini, I already specify the processes
processes=2
threads=5
But when I click run model, the interface was still, didn't link to waiting interface, however I can see the model running logs, and when finished modeling, the interface would link to waiting interface (which prove that the Process function was not working ??)
my server have 2 cpus, so I think it can support multi process
Can someone help me ? I guess there are some problems in uwsgi or nginx ?
The desire to run separate threads or processes within a request context is a common one. For various reasons, except in very narrow circumstances, it is a desire that leads to frustration. In this case, as soon as task goes out of scope, the Process machinery gets torn down.
If you want to start a long-running task from a request handler, use a framework like celery or Rq, which arrange to run jobs entirely out of process from the http server.

Simultaneous requests with turbogears2

I'm very new to web dev, and i'm trying to build a simple Web interface with Ajax calls to refresh data, and turbogears2 as the backend.
My Ajax calls are working fine and makes periodic calls to my Turbogears2 server, however these calls takes time to complete (some requests make the server to use remote SSH calls on other machines, which takes up to 3-4 seconds to complete).
My problem is that TurboGears waits for each request to complete before handling the next one, so all my concurrent Ajax calls are being queued instead of being all processed in parallel.
To refresh N values takes 3*N seconds where it could just take 3 seconds with concurrency.
Any idea how to fix this ?
Here is my current server-side code (method get_load is the one called with Ajax):
class RootController(TGController):
#expose()
def index(self):
with open ("index.html") as data:
index = data.read()
return index
#expose()
def get_load(self, ip):
command = "bash get_cpu_load.sh"
request = subprocess.Popen(["ssh", "-o ConnectTimeout=2", ip, command])
load = str(request.communicate()[0])
return load
Your problem is probably caused by the fact that you are serving requests with Gearbox wsgiref server. By default the wsgiref server is single threaded and so can serve a single request at time. That can be changed by providing the wsgiref.threaded = true configuration option in your development.ini server section (the same where ip address and port are specified too). See https://github.com/TurboGears/gearbox#gearbox-http-servers and http://turbogears.readthedocs.io/en/latest/turbogears/gearbox.html#changing-http-server for additional details.
Note that wsgiref is the development server for TurboGears and usage on production is usually discouraged. You should consider using something like waitress, chaussette or mod_wsgi when deploying your application, see http://turbogears.readthedocs.io/en/latest/cookbook/deploy/index.html?highlight=deploy

Python + uwsgi - multiprocessing and shared app state

We have a flask app running behind uwsgi with 4 processes. Its an API which serves data from one of our two ElasticSearch clusters.
On app bootstrap each process pulls config from external DB to check which ES cluster is active and connects to it.
Evey now and then POST request comes (from aws SNS service) which informs all the clients to switch ES cluster. That triggers the same function as on bootstrap - pull config from DB reconnect to active ES cluster.
It works well running as a single process, but when we have more then one process running only one of them will get updated (the one which picks up POST request)... where other processes are still connected to inactive cluster.
Pulling config on each request to make sure that ES cluster we use is active would be to slow. Im thinking to install redis locally and store the active_es_cluster there... any other ideas?
I think there are two routes you could go down.
Have an endpoint "/set_es_cluster" that gets hit by your SNS POST request. This endpoint then sets the key "active_es_cluster", which is read on every ES request by your other processes. The downside of this is that on each ES request you need to do a redis lookup first.
Have a seperate process that gets the POST request specifically (I assume the clusters are not changing often). The purpose of this process is to receive the post request and just have uWSGI gracefully restart your other flask processes.
The advantages of the second option:
Don't have to hit redis on every request
Let uWSGI handle the restarts for you (which it does well)
You already setup the config pulling at runtime anyway so it should "just work" with your existing application

Keeping concurrency in web.py applications on mod_wsgi

Sorry if this makes no sense. Please comment if clarification is needed.
I'm writing a small file upload app in web.py which I am deploying using mod_wsgi + apache. I have been having a problem with my session management and would like clarification on how the threading works in web.py.
Essentially I embed a code in a hidden field of the html page I render when someone accesses my page. The file upload is then done via a standard POST request containing both the file and the code. Then I retrieve the progress of the file by updating it in the file upload POST method and grabbing it with a GET request to a different class. The 'session' (apologies for it being fairly naive) is stored in a session object like this:
class session:
def __init__(self):
self.progress = 0
self.title = ""
self.finished = False
def advance(self):
self.progress = self.progress + 1
The sessions are all kept in a global dictionary within my app script and then accessed with my code (from earlier) as the key.
For some reason my progress seems to stay at 0 and never increments. I've been debugging for a couple hours now and I've found that the two session objects referenced from the upload class and the progress class are not the same. The two codes, however, are (as far as I can tell) equal. This is driving me mad as it worked without any problems on the web.py test server on my local machine.
EDIT: After some research it seems that the dictionary may get copied for every request. I've tried putting the dictionary in another and importing but this doesn't work. Is there some other way short of using a database to 'seperate' the sessions dictionary?
Apache/mod_wsgi can run in multiprocess configurations and possible your requests aren't even being serviced by the same process and never will if for that multiprocess configuration each process is single thread because while the upload is occuring no other requests can be handled by that same process. Read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Possibly you should use mod_wsgi daemon mode with single multiple thread daemon process.
From PEP 333, defining WSGI:
Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server
Check the documentation of your WSGI server.

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?
The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.
reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.
You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

Categories