Python - Twisted, Proxy and modifying content - python

So i've looked around at a few things involving writting an HTTP Proxy using python and the Twisted framework.
Essentially, like some other questions, I'd like to be able to modify the data that will be sent back to the browser. That is, the browser requests a resource and the proxy will fetch it. Before the resource is returned to the browser, i'd like to be able to modify ANY (HTTP headers AND content) content.
This ( Need help writing a twisted proxy ) was what I initially found. I tried it out, but it didn't work for me. I also found this ( Python Twisted proxy - how to intercept packets ) which i thought would work, however I can only see the HTTP requests from the browser.
I am looking for any advice. Some thoughts I have are to use the ProxyClient and ProxyRequest classes and override the functions, but I read that the Proxy class itself is a combination of the both.
For those who may ask to see some code, it should be noted that I have worked with only the above two examples. Any help is great.
Thanks.

To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com

For two-way proxy using twisted see the article:
http://sujitpal.blogspot.com/2010/03/http-debug-proxy-with-twisted.html

Related

HTTPServer.run requires a BaseHTTPRequestHandler class instead an object

I have an own class derived from BaseHTTPRequestHandler, which implements my specific GET method. This works quite fine:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
""" my implementation of the GET method """
myServer = HTTPServer(("127.0.0.1", 8099), MyHTTPRequestHandler)
myServer.handle_request()
But why do I need to pass my class MyHTTPRequestHandler to the HTTPServer? I know that it is required by documentation:
class http.server.BaseHTTPRequestHandler(request, client_address, server)
This class is used to handle the HTTP requests that arrive at the server. By itself, it cannot respond to any actual HTTP requests; it
must be subclassed to handle each request method (e.g. GET or POST).
BaseHTTPRequestHandler provides a number of class and instance
variables, and methods for use by subclasses.
The handler will parse the request and the headers, then call a method specific to the request type. The method name is constructed
from the request. For example, for the request method SPAM, the
do_SPAM() method will be called with no arguments. All of the relevant
information is stored in instance variables of the handler. Subclasses
should not need to override or extend the init() method.
But I do want to pass an instantiated object of my subclass instead. I don't understand why this has been designed like that and it looks like design failure to me. The purpose of object oriented programming with polymorphy is that I can subclass to implement a specific behavior with the same interfaces, so this seems to me as an unnecessary restriction.
That is what I want:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def __init__(self, myAdditionalArg):
self.myArg = myAdditionalArg
def do_GET(self):
""" my implementation of the GET method """
self.wfile(bytes(self.myArg, "utf-8"))
# ...
myReqHandler = MyHTTPRequestHandler("mySpecificString")
myServer = HTTPServer(("127.0.0.1", 8099), myReqHandler)
myServer.handle_request()
But if I do that, evidently I receive the expected error message:
TypeError: 'MyHTTPRequestHandler' object is not callable
How can I workaround this so that I can still use print a specific string?
There is also a 2nd reason why I need this: I want that MyHTTPRequestHandler provides also more information about the client, which uses the GET method to retrieve data from the server (I want to retrieve the HTTP-Header of the client browser).
I just have one client which starts a single request to the server. If a solution would work in a more general context, I'll be happy, but I won't
need it for my current project.
Somebody any idea to do that?
A server needs to create request handlers as needed to handle all the requests coming in. It would be bad design to only have one request handler. If you passed in an instance, the server could only ever handle one request at a time and if there were any side effects, it would go very very badly. Any sort of change of state is beyond the scope of what a request handler should do.
BaseHTTPRequestHandler has a method to handle message logging, and an attribute self.headers containing all the header information. It defaults to logging messages to sys.stderr, so you could do $ python -m my_server.py 2> log_file.txt to capture the log messages. or, you could write to file in your own handler.
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
log_file = os.path.join(os.path.abspath(''), 'log_file.txt') # saves in directory where server is
myArg = 'some fancy thing'
def do_GET(self):
# do things
self.wfile.write(bytes(self.myArg,'utf-8'))
# do more
msg_format = 'start header:\n%s\nend header\n'
self.log_message(msg_format, self.headers)
def log_message(self, format_str, *args): # This is basically a copy of original method, just changing the destination
with open(self.log_file, 'a') as logger:
logger.write("%s - - [%s] %s\n" %
self.log_date_time_string(),
format%args))
handler = MyHTTPRequestHandler
myServer = HTTPServer(("127.0.0.1", 8099), handler)
It is possible to derive a specific HTTPServer class (MyHttpServer), which has the following attributes:
myArg: the specific "message text" which shall be printed by the HTTP
request handler
headers: a dictionary storing the headers set by a
HTTP request handler
The server class must be packed together with MyHTTPRequestHandler. Furthermore the implementation is only working properly under the following conditions:
only one HTTP request handler requests an answer from the server at the same time (otherwise data kept by the attributes are corrupted)
MyHTTPRequestHandler is only used with MyHttpServer and vice versa (otherwise unknown side effects like exceptions or data corruption can occur)
That's why both classes must be packed and shipped together in a way like this:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.wfile.write(bytes(self.server.myArg, 'utf-8'))
#...
self.server.headers = self.headers
class MyHttpServer(HTTPServer):
def __init__(self, server_address, myArg, handler_class=MyHttpRequestHandler):
super().__init__(server_address, handler_class)
self.myArg = myArg
self.headers = dict()
The usage of these classes could look like this, whereas only one request of a client (i.e. Web-Browser) is answered by the server:
def get_header():
httpd = MyHttpServer(("127.0.0.1", 8099), "MyFancyText", MyHttpRequestHandler)
httpd.handle_request()
return httpd.headers

Session authentication with Django channels

Trying to get authentication working with Django channels with a very simple websockets app that echoes back whatever the user sends over with a prefix "You said: ".
My processes:
web: gunicorn myproject.wsgi --log-file=- --pythonpath ./myproject
realtime: daphne myproject.asgi:channel_layer --port 9090 --bind 0.0.0.0 -v 2
reatime_worker: python manage.py runworker -v 2
I run all processes when testing locally with heroku local -e .env -p 8080, but you could also run them all separately.
Note I have WSGI on localhost:8080 and ASGI on localhost:9090.
Routing and consumers:
### routing.py ###
from . import consumers
channel_routing = {
'websocket.connect': consumers.ws_connect,
'websocket.receive': consumers.ws_receive,
'websocket.disconnect': consumers.ws_disconnect,
}
and
### consumers.py ###
import traceback
from django.http import HttpResponse
from channels.handler import AsgiHandler
from channels import Group
from channels.sessions import channel_session
from channels.auth import channel_session_user, channel_session_user_from_http
from myproject import CustomLogger
logger = CustomLogger(__name__)
#channel_session_user_from_http
def ws_connect(message):
logger.info("ws_connect: %s" % message.user.email)
message.reply_channel.send({"accept": True})
message.channel_session['prefix'] = "You said"
# message.channel_session['django_user'] = message.user # tried doing this but it doesn't work...
#channel_session_user_from_http
def ws_receive(message, http_user=True):
try:
logger.info("1) User: %s" % message.user)
logger.info("2) Channel session fields: %s" % message.channel_session.__dict__)
logger.info("3) Anything at 'django_user' key? => %s" % (
'django_user' in message.channel_session,))
user = User.objects.get(pk=message.channel_session['_auth_user_id'])
logger.info(None, "4) ws_receive: %s" % user.email)
prefix = message.channel_session['prefix']
message.reply_channel.send({
'text' : "%s: %s" % (prefix, message['text']),
})
except Exception:
logger.info("ERROR: %s" % traceback.format_exc())
#channel_session_user_from_http
def ws_disconnect(message):
logger.info("ws_disconnect: %s" % message.__dict__)
message.reply_channel.send({
'text' : "%s" % "Sad to see you go :(",
})
And then to test, I go into Javascript console on the same domain as my HTTP site, and type in:
> var socket = new WebSocket('ws://localhost:9090/')
> socket.onmessage = function(e) {console.log(e.data);}
> socket.send("Testing testing 123")
VM481:2 You said: Testing testing 123
And my local server log shows:
ws_connect: test#test.com
1) User: AnonymousUser
2) Channel session fields: {'_SessionBase__session_key': 'chnb79d91b43c6c9e1ca9a29856e00ab', 'modified': False, '_session_cache': {u'prefix': u'You said', u'_auth_user_hash': u'ca4cf77d8158689b2b6febf569244198b70d5531', u'_auth_user_backend': u'django.contrib.auth.backends.ModelBackend', u'_auth_user_id': u'1'}, 'accessed': True, 'model': <class 'django.contrib.sessions.models.Session'>, 'serializer': <class 'django.core.signing.JSONSerializer'>}
3) Anything at 'django_user' key? => False
4) ws_receive: test#test.com
Which, of course, makes no sense. Few questions:
Why would Django see message.user as an AnonymousUser but have the actual user id _auth_user_id=1 (this is my correct user ID) in the session?
I am running my local server (WSGI) on 8080 and daphne (ASGI) on 9090 (different ports). And I didn't include session_key=xxxx in my WebSocket connection - yet Django was able to read my browser's cookie for the correct user, test#test.com? According to Channels docs, this shouldn't be possible.
Under my setup, what is the best / simplest way to carry out authentication with Django channels?
Note: This answer is explicit to channels 1.x, channels 2.x uses a different auth mechanism.
I had a hard time with django channels too, i had to dig into the source code to better understand the docs ...
Question 1:
The docs mention this kind of long trail of decorators relying on each other (http_session, http_session_user ...) that you can use to wrap your message consumers, in the middle of that trail it states this:
Now, one thing to note is that you only get the detailed HTTP information during the connect message of a WebSocket connection (you can read more about that in the ASGI spec) - this means we’re not wasting bandwidth sending the same information over the wire needlessly.
This also means we’ll have to grab the user in the connection handler and then store it in the session;....
Its easy to get lost in all that, at least we both did ...
You just have to remember that this happens when you use channel_session_user_from_http:
It calls http_session_user
a. calls http_session which will parse the message and give us a message.http_session attribute.
b. Upon returning from the call, it initiates a message.user based on the information it got in message.http_session ( this will bite you later)
It calls channel_session which will initiate a dummy session in message.channel_session and ties it to the message reply channel.
Now it calls transfer_user which will move the http_session into the channel_session
This happens during the connection handling of a websocket, so on subsequent messages you won't have acces to detailed HTTP information, so what's happening after the connect is that you're calling channel_session_user_from_http again, which in this situation (post-connect messages) calls http_session_user which will attempt reading the Http information but fails resulting in setting message.http_session to None and overriding message.user to AnonymousUser.
That's why you need to use channel_session_user in this case.
Question 2:
Channels can use Django sessions either from cookies (if you’re running your websocket server on the same port as your main site, using something like Daphne), or from a session_key GET parameter, which works if you want to keep running your HTTP requests through a WSGI server and offload WebSockets to a second server process on another port.
Remember http_session, that decorator that gets us the message.http_session data? it appears that if it doesn't find a session_key GET parameter it fails to settings.SESSION_COOKIE_NAME, which is the regular sessionid cookie, so whether you provide session_key or not, you'll still get connected if you're logged in, of course that happens only when your ASGI and WSGI servers are on the same domain (127.0.0.1 in this case), the port difference doesn't matter.
I think the difference that the docs are trying to communicate but didn't expand on is that you need to setup session_key GET parameter when having your ASGI and WSGI servers on different domains since cookies are restricted by domain not port.
Due to that lack of explanation i had to test running ASGI and WSGI on same port and different port and the result was the same, i was still getting authenticated, changed one server domain to 127.0.0.2 instead of 127.0.0.1 and the authentication was gone, set the session_key get parameter and the authentication was back again.
Update: a rectification of the docs paragraph was just pushed to the channels repo, it was meant to mention domain instead of port like i mentioned.
Question 3:
my answer is the same as turbotux's but longer, you should use #channel_session_user_from_http on ws_connect and #channel_session_user on ws_receive and ws_disconnect, nothing from what you showed tells that it won't work if you do that change, maybe try removing http_user=True from your receive consumer? even thou i suspect it has no effect since its undocumented and intended only to be used by Generic Consumers...
Hope this helps!
To answer your first question you need to use the:
channel_session_user
decorator in the receive and disconnect calls.
channel_session_user_from_http
calls the transfer_user session during the connect method to transfer the http session to the channel session. This way all future calls may access the channel session to retrieve user information.
To your second question I believe what you are seeing is that default web socket library passes the browser cookies over the connection.
Third, I think your setup will be working quite well once have changed the decorators.
I ran into this problem and I found that it was due to a couple of issues that might be the cause. I'm not suggesting this will solve your issue, but might give you some insight. Keep in mind I am using rest framework. First I was overriding the User model. Second when I defined the application variable in my root routing.py I didn't use my own AuthMiddleware. I was using the docs suggested AuthMiddlewareStack. So, per the Channels docs, I defined my own custom authentication middleware, which takes my JWT value from the cookies, authenticates it and assigns it to the scope["user"] like so:
routing.py
from channels.routing import ProtocolTypeRouter, URLRouter
import app.routing
from .middleware import JsonTokenAuthMiddleware
application = ProtocolTypeRouter(
{
"websocket": JsonTokenAuthMiddleware(
(URLRouter(app.routing.websocket_urlpatterns))
)
}
middleware.py
from http import cookies
from django.contrib.auth.models import AnonymousUser
from django.db import close_old_connections
from rest_framework.authtoken.models import Token
from rest_framework_jwt.authentication import BaseJSONWebTokenAuthentication
class JsonWebTokenAuthenticationFromScope(BaseJSONWebTokenAuthentication):
def get_jwt_value(self, scope):
try:
cookie = next(x for x in scope["headers"] if x[0].decode("utf-8")
== "cookie")[1].decode("utf-8")
return cookies.SimpleCookie(cookie)["JWT"].value
except:
return None
class JsonTokenAuthMiddleware(BaseJSONWebTokenAuthentication):
def __init__(self, inner):
self.inner = inner
def __call__(self, scope):
try:
close_old_connections()
user, jwt_value =
JsonWebTokenAuthenticationFromScope().authenticate(scope)
scope["user"] = user
except:
scope["user"] = AnonymousUser()
return self.inner(scope)
Hope this helps this helps!

Python holding an webpage active

I am trying have a set of python scripts report their status to a set of micro controllers.
So my idea for this is to have the python scripts each create their own webpage that can be viewed by the micro controllers, but is there anyway to have the script itself keeping the page served, i.e. an apache library so that if the script crashes or is not running the page is not served or a way to make the page have a default value if the script is not running.
You can also have a look at twisted.web
A very basic example:
from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor
class StatusPageResource(Resource):
isLeaf = True
def __init__(self, param1):
self.param1 = param1
# Call the constructor of the super class
Resource.__init__(self)
def render_GET(self, request):
return "<html><body>%s</body></html>" % self.param1
my_res = Resource()
my_res.putChild('GetStatusPage1', StatusPageResource(param1='abc'))
my_res.putChild('GetStatusPage2', StatusPageResource(param1='xyz'))
factory = Site(my_res)
reactor.listenTCP(8080, factory)
print 'Runnning on port 8080'
reactor.run()
Now point your browser to http://localhost:8080/GetStatusPage1 (for example)
You could use http://docs.python.org/library/simplehttpserver.html or some minimal http server framework like http://flask.pocoo.org/ or http://www.cherrypy.org/.
If you want to feed "live" information to your micro controllers also have a look at comet style long polling requests. You essentially keep downloading "the page" forever and analyse it as a data stream while the server keeps adding updated info at the "end of the page".

Python Logging monitor through http server

Is there any tools that catches python logging (socket or http) handlers' reports, and serves a http service so that I can check my logs through a http page?
Thanks
Finally... I found a working server that will do the job...
LoggingWebMonitor
UPDATE
I found Sentry on github, It seems more sophiscated and production-ready.
There are many to tools to easily create simple RESTful HTTP webservices. My favorite is itty.
from itty import get, run_itty
import glob, gzip, json, os, functools
def jsonify(origfunc):
#functools.wraps(origfunc)
def wrapper(*args, **kwds):
result = origfunc(*args, **kwds)
return json.dumps(result, indent=4)
return wrapper
#get('/logs')
#jsonify
def list_logfiles(request):
return glob.glob('/var/log/myserver/*.gz')
#get('/logs/(?P<name>\w+)')
def show_logfile(request, name):
fullname = os.path.join('/var/log/myserver', name)
with gzip.open(fullname, 'rb') as f:
return f.read()
run_itty(host='localhost', port=8080)
i recommend you bootle.py is a nice framework for this cases.
Here the link to project website: http://bottlepy.org/docs/dev/
Try Splunk, simple to set up and has a nice interface. You'd listen to your logfiles or simply send logs to splunk. Even works remotly for logs in multiple servers. And you can do a lot more than just checking logs.

Python Twisted JSON RPC

Can anyone recommend some simple code to set up a simple JSON RPC client and server using twisted?
I found txJSON-RPC, but I was wondering if someone had some experience using some of these anc could recommend something.
txJSONRPC is great. I use it and it works. I suggest you give it a try.
SERVER:
from txjsonrpc.web import jsonrpc
from twisted.web import server
from twisted.internet import reactor
class Math(jsonrpc.JSONRPC):
"""
An example object to be published.
"""
def jsonrpc_add(self, a, b):
"""
Return sum of arguments.
"""
return a + b
reactor.listenTCP(7080, server.Site(Math()))
reactor.run()
CLIENT:
from twisted.internet import reactor
from txjsonrpc.web.jsonrpc import Proxy
def printValue(value):
print "Result: %s" % str(value)
def printError(error):
print 'error', error
def shutDown(data):
print "Shutting down reactor..."
reactor.stop()
proxy = Proxy('http://127.0.0.1:7080/')
d = proxy.callRemote('add', 3, 5)
d.addCallback(printValue).addErrback(printError).addBoth(shutDown)
reactor.run()
As a bonus, I will leave some alternative: amp.
http://amp-protocol.net
If you are looking for a framework-independent approach, this lib I pushed (using mixin) might be helpful:
Cyclone, a Tornado async web server implementation written using twisted, has a built-in json-rpc request handler that uses the python json/simplejson module. Example server and client code is here.
wikipedia has a bunch of implementations listed for python: https://en.wikipedia.org/wiki/JSON-RPC#Implementations
That said, txjason feels like the one best integrated with twisted. It seems to support out of order responses out of the box for example. Most of it would be portable to python3 using six. The most horrible part is the parameter validation, which is not exposed in the normal public API anyway.
For me this worked better then "libraries" , speaking of client.
TESTDATA = {'id': 1234,
'method': 'getbalance',
}
URL = 'http://localhost:7777'
d= getPage(URL,method="POST",postdata=json.dumps(TESTDATA))
d.addBoth(lambda x :print(json.loads(x)))

Categories