Twisted url action routing

Twisted url action routing - python

If I have for example this simple TCP server:
from twisted.internet import reactor
from twisted.web.resource import Resource
from twisted.web.server import Site
from resources import SomeResource
logging.info("Starting server...")
root = Resource()
root.putChild("test", SomeResource())
reactor.listenTCP(8080, Site(root))
reactor.run()
With SomeResource which has the render_GET and render_POST methods for example.
Then I know I can just send a POST/GET to hostname:8080/test
But now I want to make it more complicated, I would like to do something like hostname:8080/test/status
Could that be defined inside SomeResource() as a method? or do I have to define a new resource for every different url?

If you want everything that goes to /test/.... to get to the render (render_GET/render_POST) method of SomeResource, just define it as a leaf:
class SomeResource(Resource):
isLeaf = True
If you want to look at the part after "test/", request.postpath will include that.

Related

Stream scrapy logging output to websocket

I am attempting to build an API that will run a Scrapy web spider when requested via a websocket message.
I would like to forward the logging output to the websocket client so you see what's going on in the - sometimes quite long-running - process. When finished, I will also send the scraped results.
As it is possible to run Scrapy in-process, I would like to do exactly that. I found a solution that will stream an external process to a websocket here, but that doesn't seem right if it's possible to run Scrapy inside the server.
https://tomforb.es/displaying-a-processes-output-on-a-web-page-with-websockets-and-python
There are two ways I can imagine to make this work in Twisted: Somehow using a LogObserver, or defining a LogHandler (probably StreamHandler with StringIO) and then handle the Stream in some way in Twisted with autobahn.websocket classes like WebSocketServerProtocol.
Now I am quite stuck and don't know how to connect the ends.
Could someone please provide a short example how to stream logging output from twisted logging (avoiding a file if possible) to a websocket client?

I managed to solve this by myself somehow and wanted to let you know how I did it:
The basic idea was to have a process that gets called remotely and output a streaming log to a client, usually a browser.
Instead of building all the nasty details myself, I decided to go with autobahn.ws and crossbar.io, providing pubsub and rpc via the Wamp protocol which is essentially just JSON on websockets - exactly what I had planned to build, just way more advanced!
Here is a very basic example:
from twisted.internet.defer import inlineCallbacks
from autobahn.twisted.wamp import ApplicationSession
from example.spiders.basic_spider import BasicSpider
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import logging
class PublishLogToSessionHandler(logging.Handler):
def __init__(self, session, channel):
logging.Handler.__init__(self)
self.session = session
self.channel = channel
def emit(self, record):
self.session.publish(self.channel, record.getMessage())
class AppSession(ApplicationSession):
configure_logging(install_root_handler=False)
#inlineCallbacks
def onJoin(self, details):
logging.root.addHandler(PublishLogToSessionHandler(self, 'com.example.crawler.log'))
# REGISTER a procedure for remote calling
def crawl(domain):
runner = CrawlerRunner(get_project_settings())
runner.crawl("basic", domain=domain)
return "Running..."
yield self.register(crawl, 'com.example.crawler.crawl')

Python. Tornado. Non-blocking xmlrpc client

Basically we can call xmlrpc handlers following way:
import xmlrpclib
s = xmlrpclib.ServerProxy('http://remote_host/rpc/')
print s.system.listmethods()
In tornado we can integrate it like this:
import xmlrpclib
import tornado.web
s = xmlrpclib.ServerProxy('http://remote_host/rpc/')
class MyHandler(tornado.web.RequestHandler):
def get(self):
result = s.system.listmethods()
I have following, a little bit newbie, questions:
Will result = s.system.listmethods() block tornado?
Are there any non-blocking xmlrpc clients around?
How can we achieve result = yield gen.Task(s.system.listmethods)?

1.Yes it will block tornado, since xmlrpclib uses blocking python sockets (as it is)
2.Not that I'm aware of, but I'll provide a solution where you can keep xmlrpclib but have it async
3.My solution doesn't use tornado gen.
Ok, so one useful library to have at mind whenever you're doing networking and need to write async code is gevent, it's a really good high quality library that I would recommend to everyone.
Why is it good and easy to use ?
You can write asynchronous code in a synchronous manner (so that makes it easy)
All you have to do, to do so is monkey patch with one simple line :
from gevent import monkey; monkey.patch_all()
When using tornado you need to know two things (that you may already know) :
Tornado only supports asynchronous views when acting as a HTTPServer (WSGI isn't supported for async views)
Async views need to terminate the responses by themselves you do by using either self.finish() or self.render() (which calls self.finish())
Ok so here's an example illustrating what you would need with the necessary gevent integration with tornado :
# Python immports
import functools
# Tornado imports
import tornado.ioloop
import tornado.web
import tornado.httpserver
# XMLRpc imports
import xmlrpclib
# Asynchronous gevent decorator
def gasync(func):
#tornado.web.asynchronous
#functools.wraps(func)
def f(self, *args, **kwargs):
return gevent.spawn(func, self, *args, **kwargs)
return f
# Our XML RPC service
xml_service = xmlrpclib.ServerProxy('http://remote_host/rpc/')
class MyHandler(tornado.web.RequestHandler):
#gasync
def get(self):
# This doesn't block tornado thanks to gevent
# Which patches all of xmlrpclib's socket calls
# So they no longer are blocking
result = xml_service.system.listmethods()
# Do something here
# Write response to client
self.write('hello')
self.finish()
# Our URL Mappings
handlers = [
(r"/", MyHandler),
]
def main():
# Setup app and HTTP server
application = tornado.web.Application(handlers)
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8000)
# Start ioloop
tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
main()
So give the example a try (adapt it to your needs obviously) and you should be good to go.
No need to write any extra code, gevent does all the work of patching up python sockets so they can be used asynchronously while still writing code in a synchronous fashion (which is a real bonus).
Hope this helps :)

I do not think so.
Because Tornado has it's own ioloop, but gevent's ioloop is libevent.
So gevent will block Tornado's ioloop.

How do I make one instance in Python that I can access from different modules?

I'm writing a web application that connects to a database. I'm currently using a variable in a module that I import from other modules, but this feels nasty.
# server.py
from hexapoda.application import application
if __name__ == '__main__':
from paste import httpserver
httpserver.serve(application, host='127.0.0.1', port='1337')
# hexapoda/application.py
from mongoalchemy.session import Session
db = Session.connect('hexapoda')
import hexapoda.tickets.controllers
# hexapoda/tickets/controllers.py
from hexapoda.application import db
def index(request, params):
tickets = db.query(Ticket)
The problem is that I get multiple connections to the database (I guess that because I import application.py in two different modules, the Session.connect() function gets executed twice).
How can I access db from multiple modules without creating multiple connections (i.e. only call Session.connect() once in the entire application)?

Try the Twisted framework with something like:
from twisted.enterprise import adbapi
class db(object):
def __init__(self):
self.dbpool = adbapi.ConnectionPool('MySQLdb',
db='database',
user='username',
passwd='password')
def query(self, sql)
self.dbpool.runInteraction(self._query, sql)
def _query(self, tx, sql):
tx.execute(sql)
print tx.fetchone()

That's probably not what you want to do - a single connection per app means that your app can't scale.
The usual solution is to connect to the database when a request comes in and store that connection in a variable with "request" scope (i.e. it lives as long as the request).
A simple way to achieve that is to put it in the request:
request.db = ...connect...
Your web framework probably offers a way to annotate methods or something like a filter which sees all requests. Put the code to open/close the connection there.
If opening connections is expensive, use connection pooling.

Python holding an webpage active

I am trying have a set of python scripts report their status to a set of micro controllers.
So my idea for this is to have the python scripts each create their own webpage that can be viewed by the micro controllers, but is there anyway to have the script itself keeping the page served, i.e. an apache library so that if the script crashes or is not running the page is not served or a way to make the page have a default value if the script is not running.

You can also have a look at twisted.web
A very basic example:
from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor
class StatusPageResource(Resource):
isLeaf = True
def __init__(self, param1):
self.param1 = param1
# Call the constructor of the super class
Resource.__init__(self)
def render_GET(self, request):
return "<html><body>%s</body></html>" % self.param1
my_res = Resource()
my_res.putChild('GetStatusPage1', StatusPageResource(param1='abc'))
my_res.putChild('GetStatusPage2', StatusPageResource(param1='xyz'))
factory = Site(my_res)
reactor.listenTCP(8080, factory)
print 'Runnning on port 8080'
reactor.run()
Now point your browser to http://localhost:8080/GetStatusPage1 (for example)

You could use http://docs.python.org/library/simplehttpserver.html or some minimal http server framework like http://flask.pocoo.org/ or http://www.cherrypy.org/.
If you want to feed "live" information to your micro controllers also have a look at comet style long polling requests. You essentially keep downloading "the page" forever and analyse it as a data stream while the server keeps adding updated info at the "end of the page".

Python - Twisted, Proxy and modifying content

So i've looked around at a few things involving writting an HTTP Proxy using python and the Twisted framework.
Essentially, like some other questions, I'd like to be able to modify the data that will be sent back to the browser. That is, the browser requests a resource and the proxy will fetch it. Before the resource is returned to the browser, i'd like to be able to modify ANY (HTTP headers AND content) content.
This ( Need help writing a twisted proxy ) was what I initially found. I tried it out, but it didn't work for me. I also found this ( Python Twisted proxy - how to intercept packets ) which i thought would work, however I can only see the HTTP requests from the browser.
I am looking for any advice. Some thoughts I have are to use the ProxyClient and ProxyRequest classes and override the functions, but I read that the Proxy class itself is a combination of the both.
For those who may ask to see some code, it should be noted that I have worked with only the above two examples. Any help is great.
Thanks.

To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com

For two-way proxy using twisted see the article:
http://sujitpal.blogspot.com/2010/03/http-debug-proxy-with-twisted.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Twisted url action routing - python

If you want everything that goes to /test/.... to get to the render (render_GET/render_POST) method of SomeResource, just define it as a leaf: class SomeResource(Resource): isLeaf = True If you want to look at the part after "test/", request.postpath will include that.

Related

Stream scrapy logging output to websocket

Python. Tornado. Non-blocking xmlrpc client

How do I make one instance in Python that I can access from different modules?

Python holding an webpage active

Python - Twisted, Proxy and modifying content

Categories

Resources