How to detect HTTP Request in python + twisted?

How to detect HTTP Request in python + twisted? - python

I am learning network programming using twisted 10 in python. In below code is there any way to detect HTTP Request when data recieved? also retrieve Domain name, Sub Domain, Port values from this? Discard it if its not http data?
from twisted.internet import stdio, reactor, protocol
from twisted.protocols import basic
import re
class DataForwardingProtocol(protocol.Protocol):
def _ _init_ _(self):
self.output = None
self.normalizeNewlines = False
def dataReceived(self, data):
if self.normalizeNewlines:
data = re.sub(r"(\r\n|\n)", "\r\n", data)
if self.output:
self.output.write(data)
class StdioProxyProtocol(DataForwardingProtocol):
def connectionMade(self):
inputForwarder = DataForwardingProtocol( )
inputForwarder.output = self.transport
inputForwarder.normalizeNewlines = True
stdioWrapper = stdio.StandardIO(inputForwarder)
self.output = stdioWrapper
print "Connected to server. Press ctrl-C to close connection."
class StdioProxyFactory(protocol.ClientFactory):
protocol = StdioProxyProtocol
def clientConnectionLost(self, transport, reason):
reactor.stop( )
def clientConnectionFailed(self, transport, reason):
print reason.getErrorMessage( )
reactor.stop( )
if __name__ == '_ _main_ _':
import sys
if not len(sys.argv) == 3:
print "Usage: %s host port" % _ _file_ _
sys.exit(1)
reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))
reactor.run( )

protocol.dataReceived, which you're overriding, is too low-level to serve for the purpose without smart buffering that you're not doing -- per the docs I just quoted,
Called whenever data is received.
Use this method to translate to a
higher-level message. Usually, some
callback will be made upon the receipt
of each complete protocol message.
Parameters
data
a string of
indeterminate length. Please keep in
mind that you will probably need to
buffer some data, as partial (or
multiple) protocol messages may be
received! I recommend that unit tests
for protocols call through to this
method with differing chunk sizes,
down to one byte at a time.
You appear to be completely ignoring this crucial part of the docs.
You could instead use LineReceiver.lineReceived (inheriting from protocols.basic.LineReceiver, of course) to take advantage of the fact that HTTP requests come in "lines" -- you'll still need to join up headers that are being sent as multiple lines, since as this tutorial says:
Header lines beginning with space or
tab are actually part of the previous
header line, folded into multiple
lines for easy reading.
Once you have a nicely formatted/parsed response (consider studying twisted.web's sources so see one way it could be done),
retrieve Domain name, Sub Domain, Port
values from this?
now the Host header (cfr the RFC section 14.23) is the one containing this info.

Just based on what you seems to be attempting, I think the following would be the path of least resistance:
http://twistedmatrix.com/documents/10.0.0/api/twisted.web.proxy.html
That's the twisted class for building an HTTP Proxy. It will let you intercept the requests, look at the destination and look at the sender. You can also look at all the headers and the content going back and forth. You seem to be trying to re-write the HTTP Protocol and Proxy class that twisted has already provided for you. I hope this helps.

Related

how to limit tornado websocket message size

I have written a websocket server in tornado and on_message method is called when a message is received. The problem is, the message size is unlimited by defualt, In other word, the project is opened to attack by sending a huge data(Message) from a client to the websocket and it makes the server side memory full. there has to be an option to put a limit on incoming message size, is there? if not, what i have to do to avoid such bug?Here is my code to get messages only less than 128 byte length, but it doesn't seem to work.
class ClientWebSocketConnectionHandler(tornado.websocket.WebSocketHandler):
def open(self):
print "Connection is opened"
def on_message(self, message):
print message
def on_close(self):
print "closed"
handlers = [(r'/', ClientWebSocketConnectionHandler)]
tornado.web.Application.__init__(self, handlers)
TheShieldsWebSocket = MainApplication()
server =tornado.httpserver.HTTPServer(TheShieldsWebSocket,max_body_size=128)
server.listen(8080)

Since version 4.5 Tornado will close the connection automatically if it receives more than 10 MiB of data in a single websocket frame (message). So, you don't have to worry about someone sending huge data in a single message. You can see this in the source code. It's also mentioned in the docs of WebsocketHandler in the second-last paragraph.
If you'd like to change the default frame limit you can pass your Application class an argument called websocket_max_message_size with the size in bytes.
app = tornado.web.Application(
# your handlers etc,
websocket_max_message_size=128
)

take a look at the documentation here:
http://www.tornadoweb.org/en/stable/http1connection.html#tornado.http1connection.HTTP1Connection.set_max_body_size
To paraphrase for future proofing the link:
set_max_body_size(max_body_size)[source]
Sets the body size limit for a single request.
Overrides the value from HTTP1ConnectionParameters.

Am I using classes and implementing functionality correctly? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have to create a listening server that will receive HTTP POST / XML alert traffic from a network sensor and parse out the received XML. Being a beginner to Python, and having a tough time understanding classes, I wanted to get advice on if I'm implementing the classes and functionality I'm trying to achieve correctly, and if there's a better or "more Pythonic" way of doing it. I'm forcing myself to use classes in hopes that I better grasp the concept, I know I can just use regular functions.
The script so far:
I'm using the BaseHTTPServer and SocketServer module to create a threaded HTTP server, and xml.dom.minidom class for parsing the XML data. So far I have two classes set up - one to setup the threading (ThreadedHTTPServer) and another with everything else (ThreadedHTTPRequestHandler). The "everything else" class is currently managing the sessions and manipulating the data. I'm thinking I need three classes, breaking out the data manipulation into the third and leaving the second just to managing the inbound connections. Would this be correct? How would I pass the connection data from the ThreadedHTTPRequestHandler class to the new class that will be parsing and manipulating the XML data?
Any help for this newbie would be appreciated. Code is below, and it's currently working. All it does at this time is accept incoming connections and prints the XML of a specific tag I'm interested in.
import cgi
from xml.dom.minidom import parseString
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from SocketServer import ThreadingMixIn
# Server settings
HOST = ''
PORT = 5000
BUFF = 2048
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
"""
This class sets up multi-threading for the server
"""
pass
class ThreadedHTTPRequestHandler(BaseHTTPRequestHandler):
'''
This class is the overall request handler.
This class contains functions to manage client connections and manipulate data.
'''
def do_POST(self):
'''
This method handles the inbound HTTP POST data
'''
print 'Connection from: ', self.client_address[0], self.client_address[1]
ctype = self.headers.getheader('content-type')
content_len = int(self.headers.getheader('content-length'))
if ctype == 'multipart/form-data':
self.post_body = cgi.parse_multipart(self.rfile)
elif ctype == 'application/x-www-form-urlencoded':
self.post_body = self.rfile.read(content_len)
else:
self.post_body = ""
self.done(200)
self.handleXML()
def done(self, code):
'''
Send back an HTTP 200 OK and close the connection
'''
try:
self.send_response(code)
self.end_headers()
except:
pass
print 'Connection from: ', self.client_address[0], self.client_address[1], ' closed.'
#class XMLHandler():
def handleXML(self):
'''
This method parses and manipulates the XML alert data
'''
xml_dom = parseString(self.post_body)
xmlTag = xml_dom.getElementsByTagName('malware')[0].toxml()
#print out the xml tag and data in this format: <tag>data</tag>
print xmlTag
if __name__ == "__main__":
try:
server = ThreadedHTTPServer((HOST, PORT), ThreadedHTTPRequestHandler).serve_forever()
print
except KeyboardInterrupt:
pass

You don't necessarily need a third class. What you need is a freestanding function,
def handle_xml(post_body):
# work
so that you no longer need to store the post_body on the ThreadedHTTPRequestHandler.
Class hierarchies are a good fit for some problems, and a bad fit for most. Don't use them if you don't need to, they'll just complicate your code.

Python - Twisted, Proxy and modifying content

So i've looked around at a few things involving writting an HTTP Proxy using python and the Twisted framework.
Essentially, like some other questions, I'd like to be able to modify the data that will be sent back to the browser. That is, the browser requests a resource and the proxy will fetch it. Before the resource is returned to the browser, i'd like to be able to modify ANY (HTTP headers AND content) content.
This ( Need help writing a twisted proxy ) was what I initially found. I tried it out, but it didn't work for me. I also found this ( Python Twisted proxy - how to intercept packets ) which i thought would work, however I can only see the HTTP requests from the browser.
I am looking for any advice. Some thoughts I have are to use the ProxyClient and ProxyRequest classes and override the functions, but I read that the Proxy class itself is a combination of the both.
For those who may ask to see some code, it should be noted that I have worked with only the above two examples. Any help is great.
Thanks.

To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com

For two-way proxy using twisted see the article:
http://sujitpal.blogspot.com/2010/03/http-debug-proxy-with-twisted.html

How to implement a two way jsonrpc + twisted server/client

Hello I am working on develop a rpc server based on twisted to serve several microcontrollers which make rpc call to twisted jsonrpc server. But the application also required that server send information to each micro at any time, so the question is how could be a good practice to prevent that the response from a remote jsonrpc call from a micro be confused with a server jsonrpc request which is made for a user.
The consequence that I am having now is that micros are receiving bad information, because they dont know if netstring/json string that is comming from socket is their response from a previous requirement or is a new request from server.
Here is my code:
from twisted.internet import reactor
from txjsonrpc.netstring import jsonrpc
import weakref
creds = {'user1':'pass1','user2':'pass2','user3':'pass3'}
class arduinoRPC(jsonrpc.JSONRPC):
def connectionMade(self):
pass
def jsonrpc_identify(self,username,password,mac):
""" Each client must be authenticated just after to be connected calling this rpc """
if creds.has_key(username):
if creds[username] == password:
authenticated = True
else:
authenticated = False
else:
authenticated = False
if authenticated:
self.factory.clients.append(self)
self.factory.references[mac] = weakref.ref(self)
return {'results':'Authenticated as %s'%username,'error':None}
else:
self.transport.loseConnection()
def jsonrpc_sync_acq(self,data,f):
"""Save into django table data acquired from sensors and send ack to gateway"""
if not (self in self.factory.clients):
self.transport.loseConnection()
print f
return {'results':'synced %s records'%len(data),'error':'null'}
def connectionLost(self, reason):
""" mac address is searched and all reference to self.factory.clientes are erased """
for mac in self.factory.references.keys():
if self.factory.references[mac]() == self:
print 'Connection closed - Mac address: %s'%mac
del self.factory.references[mac]
self.factory.clients.remove(self)
class rpcfactory(jsonrpc.RPCFactory):
protocol = arduinoRPC
def __init__(self, maxLength=1024):
self.maxLength = maxLength
self.subHandlers = {}
self.clients = []
self.references = {}
""" Asynchronous remote calling to micros, simulating random calling from server """
import threading,time,random,netstring,json
class asyncGatewayCalls(threading.Thread):
def __init__(self,rpcfactory):
threading.Thread.__init__(self)
self.rpcfactory = rpcfactory
"""identifiers of each micro/client connected"""
self.remoteMacList = ['12:23:23:23:23:23:23','167:67:67:67:67:67:67','90:90:90:90:90:90:90']
def run(self):
while True:
time.sleep(10)
while True:
""" call to any of three potential micros connected """
mac = self.remoteMacList[random.randrange(0,len(self.remoteMacList))]
if self.rpcfactory.references.has_key(mac):
print 'Calling %s'%mac
proto = self.rpcfactory.references[mac]()
""" requesting echo from selected micro"""
dataToSend = netstring.encode(json.dumps({'method':'echo_from_micro','params':['plop']}))
proto.transport.write(dataToSend)
break
factory = rpcfactory(arduinoRPC)
"""start thread caller"""
r=asyncGatewayCalls(factory)
r.start()
reactor.listenTCP(7080, factory)
print "Micros remote RPC server started"
reactor.run()

You need to add a enough information to each message so that the recipient can determine how to interpret it. Your requirements sounds very similar to those of AMP, so you could either use AMP instead or use the same structure as AMP to identify your messages. Specifically:
In requests, put a particular key - for example, AMP uses "_ask" to identify requests. It also gives these a unique value, which further identifies that request for the lifetime of the connection.
In responses, put a different key - for example, AMP uses "_answer" for this. The value matches up with the value from the "_ask" key in the request the response is for.
Using an approach like this, you just have to look to see whether there is an "_ask" key or an "_answer" key to determine if you've received a new request or a response to a previous request.
On a separate topic, your asyncGatewayCalls class shouldn't be thread-based. There's no apparent reason for it to use threads, and by doing so it is also misusing Twisted APIs in a way which will lead to undefined behavior. Most Twisted APIs can only be used in the thread in which you called reactor.run. The only exception is reactor.callFromThread, which you can use to send a message to the reactor thread from any other thread. asyncGatewayCalls tries to write to a transport, though, which will lead to buffer corruption or arbitrary delays in the data being sent, or perhaps worse things. Instead, you can write asyncGatewayCalls like this:
from twisted.internet.task import LoopingCall
class asyncGatewayCalls(object):
def __init__(self, rpcfactory):
self.rpcfactory = rpcfactory
self.remoteMacList = [...]
def run():
self._call = LoopingCall(self._pokeMicro)
return self._call.start(10)
def _pokeMicro(self):
while True:
mac = self.remoteMacList[...]
if mac in self.rpcfactory.references:
proto = ...
dataToSend = ...
proto.transport.write(dataToSend)
break
factory = ...
r = asyncGatewayCalls(factory)
r.run()
reactor.listenTCP(7080, factory)
reactor.run()
This gives you a single-threaded solution which should have the same behavior as you intended for the original asyncGatewayCalls class. Instead of sleeping in a loop in a thread in order to schedule the calls, though, it uses the reactor's scheduling APIs (via the higher-level LoopingCall class, which schedules things to be called repeatedly) to make sure _pokeMicro gets called every ten seconds.

Writing a blocking wrapper around twisted's IRC client

I'm trying to write a dead-simple interface for an IRC library, like so:
import simpleirc
connection = simpleirc.Connect('irc.freenode.net', 6667)
channel = connection.join('foo')
find_command = re.compile(r'google ([a-z]+)').findall
for msg in channel:
for t in find_command(msg):
channel.say("http://google.com/search?q=%s" % t)
Working from their example, I'm running into trouble (code is a bit lengthy, so I pasted it here). Since the call to channel.__next__ needs to be returned when the callback <IRCClient instance>.privmsg is called, there doesn't seem to be a clean option. Using exceptions or threads seems like the wrong thing here, is there a simpler (blocking?) way of using twisted that would make this possible?

In general, if you're trying to use Twisted in a "blocking" way, you're going to run into a lot of difficulties, because that's neither the way it's intended to be used, nor the way in which most people use it.
Going with the flow is generally a lot easier, and in this case, that means embracing callbacks. The callback-style solution to your question would look something like this:
import re
from twisted.internet import reactor, protocol
from twisted.words.protocols import irc
find_command = re.compile(r'google ([a-z]+)').findall
class Googler(irc.IRCClient):
def privmsg(self, user, channel, message):
for text in find_command(message):
self.say(channel, "http://google.com/search?q=%s" % (text,))
def connect():
cc = protocol.ClientCreator(reactor, Googler)
return cc.connectTCP(host, port)
def run(proto):
proto.join(channel)
def main():
d = connect()
d.addCallback(run)
reactor.run()
This isn't absolutely required (but I strongly suggest you consider trying it). One alternative is inlineCallbacks:
import re
from twisted.internet import reactor, protocol, defer
from twisted.words.protocols import irc
find_command = re.compile(r'google ([a-z]+)').findall
class Googler(irc.IRCClient):
def privmsg(self, user, channel, message):
for text in find_command(message):
self.say(channel, "http://google.com/search?q=%s" % (text,))
#defer.inlineCallbacks
def run():
cc = protocol.ClientCreator(reactor, Googler)
proto = yield cc.connectTCP(host, port)
proto.join(channel)
def main():
run()
reactor.run()
Notice no more addCallbacks. It's been replaced by yield in a decorated generator function. This could get even closer to what you asked for if you had a version of Googler with a different API (the one above should work with IRCClient from Twisted as it is written - though I didn't test it). It would be entirely possible for Googler.join to return a Channel object of some sort, and for that Channel object to be iterable like this:
#defer.inlineCallbacks
def run():
cc = protocol.ClientCreator(reactor, Googler)
proto = yield cc.connectTCP(host, port)
channel = proto.join(channel)
for msg in channel:
msg = yield msg
for text in find_command(msg):
channel.say("http://google.com/search?q=%s" % (text,))
It's only a matter of implementing this API on top of the ones already present. Of course, the yield expressions are still there, and I don't know how much this will upset you. ;)
It's possible to go still further away from callbacks and make the context switches necessary for asynchronous operation to work completely invisible. This is bad for the same reason it would be bad for sidewalks outside your house to be littered with invisible bear traps. However, it's possible. Using something like corotwine, itself based on a third-party coroutine library for CPython, you can have the implementation of Channel do the context switching itself, rather than requiring the calling application code to do it. The result might look something like:
from corotwine import protocol
def run():
proto = Googler()
transport = protocol.gConnectTCP(host, port)
proto.makeConnection(transport)
channel = proto.join(channel)
for msg in channel:
for text in find_command(msg):
channel.say("http://google.com/search?q=%s" % (text,))
with an implementation of Channel that might look something like:
from corotwine import defer
class Channel(object):
def __init__(self, ircClient, name):
self.ircClient = ircClient
self.name = name
def __iter__(self):
while True:
d = self.ircClient.getNextMessage(self.name)
message = defer.blockOn(d)
yield message
This in turn depends on a new Googler method, getNextMessage, which is a straightforward feature addition based on existing IRCClient callbacks:
from twisted.internet import defer
class Googler(irc.IRCClient):
def connectionMade(self):
irc.IRCClient.connectionMade(self)
self._nextMessages = {}
def getNextMessage(self, channel):
if channel not in self._nextMessages:
self._nextMessages[channel] = defer.DeferredQueue()
return self._nextMessages[channel].get()
def privmsg(self, user, channel, message):
if channel not in self._nextMessages:
self._nextMessages[channel] = defer.DeferredQueue()
self._nextMessages[channel].put(message)
To run this, you create a new greenlet for the run function and switch to it, and then start the reactor.
from greenlet import greenlet
def main():
greenlet(run).switch()
reactor.run()
When run gets to its first asynchronous operation, it switches back to the reactor greenlet (which is the "main" greenlet in this case, but it doesn't really matter) to let the asynchronous operation complete. When it completes, corotwine turns the callback into a greenlet switch back into run. So run is granted the illusion of running straight through, like a "normal" synchronous program. Keep in mind that it is just an illusion, though.
So, it's possible to get as far away from the callback-oriented style that is most commonly used with Twisted as you want. It's not necessarily a good idea, though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to detect HTTP Request in python + twisted? - python

Related

how to limit tornado websocket message size

Am I using classes and implementing functionality correctly? [closed]

Python - Twisted, Proxy and modifying content

How to implement a two way jsonrpc + twisted server/client

Writing a blocking wrapper around twisted's IRC client

Categories

Resources