Share a background process in django?

Share a background process in django? - python

My django app talks to a SOAP service using the suds-jurko library
from suds.client import Client
try:
URL = "http://192.168.12.11/xdwq/some_service.asmx?WSDL"
client = Client(URL, timeout=30)
except:
# Fallback mode
pass
def get_data(ID):
try:
response = client.service.GetData(ID)
data = response.diffgram.NewDataSet.master
return data
except:
return None
In my views
data = get_data(ID)
The problem is that the service takes quite some time to initialize (~20 seconds). Subsequent requests take upto 3 seconds to return. Whenever the page is requested the webserver (apache with mod_wsgi) takes quite a while to load on some requests.
In my apache configuration
WSGIDaemonProcess www.example.com user=hyde group=hyde threads=15 maximum-requests=10000
How do I write my code, so that apache (or django) can share a single background process for the SOAP service and minimize the 30s penalty?
I have been reading about celery and other such methods but am unsure how to proceed. Please advise.

You must create separate background process, using pure python or some third party modules (for example celery, as mentioned) and communicate with that process from django views (using unix or tcp sockets for example).
Also instead of WSGI, you can use different method to serve django application (gunicorn, uwsgi) that will persist in memory, but this is really dirty solution and I don't recommend that.

Related

Simultaneous requests with turbogears2

I'm very new to web dev, and i'm trying to build a simple Web interface with Ajax calls to refresh data, and turbogears2 as the backend.
My Ajax calls are working fine and makes periodic calls to my Turbogears2 server, however these calls takes time to complete (some requests make the server to use remote SSH calls on other machines, which takes up to 3-4 seconds to complete).
My problem is that TurboGears waits for each request to complete before handling the next one, so all my concurrent Ajax calls are being queued instead of being all processed in parallel.
To refresh N values takes 3*N seconds where it could just take 3 seconds with concurrency.
Any idea how to fix this ?
Here is my current server-side code (method get_load is the one called with Ajax):
class RootController(TGController):
#expose()
def index(self):
with open ("index.html") as data:
index = data.read()
return index
#expose()
def get_load(self, ip):
command = "bash get_cpu_load.sh"
request = subprocess.Popen(["ssh", "-o ConnectTimeout=2", ip, command])
load = str(request.communicate()[0])
return load

Your problem is probably caused by the fact that you are serving requests with Gearbox wsgiref server. By default the wsgiref server is single threaded and so can serve a single request at time. That can be changed by providing the wsgiref.threaded = true configuration option in your development.ini server section (the same where ip address and port are specified too). See https://github.com/TurboGears/gearbox#gearbox-http-servers and http://turbogears.readthedocs.io/en/latest/turbogears/gearbox.html#changing-http-server for additional details.
Note that wsgiref is the development server for TurboGears and usage on production is usually discouraged. You should consider using something like waitress, chaussette or mod_wsgi when deploying your application, see http://turbogears.readthedocs.io/en/latest/cookbook/deploy/index.html?highlight=deploy

Is there a less clunky way to interact with an AWS worker tier?

I have an Elastic Beanstalk application which is running a web server environment and a worker tier environment. My goal is to pass some parameters to an endpoint in the web server which submits a request to the worker which will then go off and do a long computation and write the results to an S3 bucket. For now I'm ignoring the "long computation" part and just writing a little hello world application which simulates the workflow. Here's my Flask application:
from flask import Flask, request
import boto3
import json
application = Flask(__name__)
#application.route("/web")
def test():
data = json.dumps({"file": request.args["file"], "message": request.args["message"]})
boto3.client("sqs").send_message(
QueueUrl = "really_really_long_url_for_the_workers_sqs_queue",
MessageBody = data)
return data
#application.route("/worker", methods = ["POST"])
def worker():
data = request.get_json()
boto3.resource("s3").Bucket("myBucket").put_object(Key = data["file"], Body = data["message"])
return data["message"]
if __name__ == "__main__":
application.run(debug = True)
(Note that I changed the worker's HTTP Path from the default / to /worker.) I deployed this application to both the web server and to the worker, and it does exactly what I expected. Of course, I had to do the usual IAMS configuration.
What I don't like about this is the fact that I have to hard code my worker's SQS URL into my web server code. This makes it more complicated to change which queue the worker polls, and more complicated to add additional workers, both of which will be convenient in production. I would like some code which says "send this message to whatever queue worker X is currently polling". It's obviously not a huge deal, but I thought I would see if anyone knows a way to do this.

Given the nature of the queue URLs, you may want to try keeping them in some external storage (an in-memory database or key-value store, perhaps) that associates the URLs with the IDs of the workers currently using them. That way you can update them as need be without having to modify your application. (The downside would be that you then have [an] additional source[s] of data to maintain and you'd need to write the interfacing code for both the server and workers.)

Python gevent+bottle. Querying an API. How to use gevent to prevent timeout locks?

I'm using gevent + bottle for following:
call API method on remote server
Process result from the API
return HTML
I've set a tiemout for the API call (httplib/socket), but if it's set to 5 seconds (for example), my python script is busy for that time and can't return any other pages (which is normal).
Question:
Can I somehow make a clever use of gevent (in a separate script, maybe?) to handle such long requests?
I was thinking of starting a separate API-interrogating script on localhost:8080 and putting it behind a load balancer (as "Internet" suggested) but I'm sure there msut be a better way.
I am not an experienced programmer, so thank you for your help!

Actually, your problem should not exist. The gevent server backend can handle any number of requests at the same time. If one is blocked for 5 seconds, that does not affect the other requests arriving at the server. Thats the point of the gevent server backend.
1) Are you sure that you use the gevent server backend properly? And not just a monkey-patched version of the wsgiref default server (which is single-threaded)?
2) Did you start the server via bottle.py --server gevent? If not, did you gevent.monkey.patch_all() before importing all the other socket-related stuff (including bottle)?
Example:
from gevent import monkey
monkey.patch_all()
import bottle
import urllib2
#bottle.route(...)
def callback():
urllib2.open(...)
bottle.run(server='gevent')

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.

I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?

The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.

reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.

You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Share a background process in django? - python

Related

Simultaneous requests with turbogears2

Is there a less clunky way to interact with an AWS worker tier?

Python gevent+bottle. Querying an API. How to use gevent to prevent timeout locks?

Web application: Hold large object between requests

Python, Twisted, Django, reactor.run() causing problem

Categories

Resources