Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have to create a listening server that will receive HTTP POST / XML alert traffic from a network sensor and parse out the received XML. Being a beginner to Python, and having a tough time understanding classes, I wanted to get advice on if I'm implementing the classes and functionality I'm trying to achieve correctly, and if there's a better or "more Pythonic" way of doing it. I'm forcing myself to use classes in hopes that I better grasp the concept, I know I can just use regular functions.
The script so far:
I'm using the BaseHTTPServer and SocketServer module to create a threaded HTTP server, and xml.dom.minidom class for parsing the XML data. So far I have two classes set up - one to setup the threading (ThreadedHTTPServer) and another with everything else (ThreadedHTTPRequestHandler). The "everything else" class is currently managing the sessions and manipulating the data. I'm thinking I need three classes, breaking out the data manipulation into the third and leaving the second just to managing the inbound connections. Would this be correct? How would I pass the connection data from the ThreadedHTTPRequestHandler class to the new class that will be parsing and manipulating the XML data?
Any help for this newbie would be appreciated. Code is below, and it's currently working. All it does at this time is accept incoming connections and prints the XML of a specific tag I'm interested in.
import cgi
from xml.dom.minidom import parseString
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from SocketServer import ThreadingMixIn
# Server settings
HOST = ''
PORT = 5000
BUFF = 2048
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
"""
This class sets up multi-threading for the server
"""
pass
class ThreadedHTTPRequestHandler(BaseHTTPRequestHandler):
'''
This class is the overall request handler.
This class contains functions to manage client connections and manipulate data.
'''
def do_POST(self):
'''
This method handles the inbound HTTP POST data
'''
print 'Connection from: ', self.client_address[0], self.client_address[1]
ctype = self.headers.getheader('content-type')
content_len = int(self.headers.getheader('content-length'))
if ctype == 'multipart/form-data':
self.post_body = cgi.parse_multipart(self.rfile)
elif ctype == 'application/x-www-form-urlencoded':
self.post_body = self.rfile.read(content_len)
else:
self.post_body = ""
self.done(200)
self.handleXML()
def done(self, code):
'''
Send back an HTTP 200 OK and close the connection
'''
try:
self.send_response(code)
self.end_headers()
except:
pass
print 'Connection from: ', self.client_address[0], self.client_address[1], ' closed.'
#class XMLHandler():
def handleXML(self):
'''
This method parses and manipulates the XML alert data
'''
xml_dom = parseString(self.post_body)
xmlTag = xml_dom.getElementsByTagName('malware')[0].toxml()
#print out the xml tag and data in this format: <tag>data</tag>
print xmlTag
if __name__ == "__main__":
try:
server = ThreadedHTTPServer((HOST, PORT), ThreadedHTTPRequestHandler).serve_forever()
print
except KeyboardInterrupt:
pass
You don't necessarily need a third class. What you need is a freestanding function,
def handle_xml(post_body):
# work
so that you no longer need to store the post_body on the ThreadedHTTPRequestHandler.
Class hierarchies are a good fit for some problems, and a bad fit for most. Don't use them if you don't need to, they'll just complicate your code.
Related
I have an own class derived from BaseHTTPRequestHandler, which implements my specific GET method. This works quite fine:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
""" my implementation of the GET method """
myServer = HTTPServer(("127.0.0.1", 8099), MyHTTPRequestHandler)
myServer.handle_request()
But why do I need to pass my class MyHTTPRequestHandler to the HTTPServer? I know that it is required by documentation:
class http.server.BaseHTTPRequestHandler(request, client_address, server)
This class is used to handle the HTTP requests that arrive at the server. By itself, it cannot respond to any actual HTTP requests; it
must be subclassed to handle each request method (e.g. GET or POST).
BaseHTTPRequestHandler provides a number of class and instance
variables, and methods for use by subclasses.
The handler will parse the request and the headers, then call a method specific to the request type. The method name is constructed
from the request. For example, for the request method SPAM, the
do_SPAM() method will be called with no arguments. All of the relevant
information is stored in instance variables of the handler. Subclasses
should not need to override or extend the init() method.
But I do want to pass an instantiated object of my subclass instead. I don't understand why this has been designed like that and it looks like design failure to me. The purpose of object oriented programming with polymorphy is that I can subclass to implement a specific behavior with the same interfaces, so this seems to me as an unnecessary restriction.
That is what I want:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def __init__(self, myAdditionalArg):
self.myArg = myAdditionalArg
def do_GET(self):
""" my implementation of the GET method """
self.wfile(bytes(self.myArg, "utf-8"))
# ...
myReqHandler = MyHTTPRequestHandler("mySpecificString")
myServer = HTTPServer(("127.0.0.1", 8099), myReqHandler)
myServer.handle_request()
But if I do that, evidently I receive the expected error message:
TypeError: 'MyHTTPRequestHandler' object is not callable
How can I workaround this so that I can still use print a specific string?
There is also a 2nd reason why I need this: I want that MyHTTPRequestHandler provides also more information about the client, which uses the GET method to retrieve data from the server (I want to retrieve the HTTP-Header of the client browser).
I just have one client which starts a single request to the server. If a solution would work in a more general context, I'll be happy, but I won't
need it for my current project.
Somebody any idea to do that?
A server needs to create request handlers as needed to handle all the requests coming in. It would be bad design to only have one request handler. If you passed in an instance, the server could only ever handle one request at a time and if there were any side effects, it would go very very badly. Any sort of change of state is beyond the scope of what a request handler should do.
BaseHTTPRequestHandler has a method to handle message logging, and an attribute self.headers containing all the header information. It defaults to logging messages to sys.stderr, so you could do $ python -m my_server.py 2> log_file.txt to capture the log messages. or, you could write to file in your own handler.
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
log_file = os.path.join(os.path.abspath(''), 'log_file.txt') # saves in directory where server is
myArg = 'some fancy thing'
def do_GET(self):
# do things
self.wfile.write(bytes(self.myArg,'utf-8'))
# do more
msg_format = 'start header:\n%s\nend header\n'
self.log_message(msg_format, self.headers)
def log_message(self, format_str, *args): # This is basically a copy of original method, just changing the destination
with open(self.log_file, 'a') as logger:
logger.write("%s - - [%s] %s\n" %
self.log_date_time_string(),
format%args))
handler = MyHTTPRequestHandler
myServer = HTTPServer(("127.0.0.1", 8099), handler)
It is possible to derive a specific HTTPServer class (MyHttpServer), which has the following attributes:
myArg: the specific "message text" which shall be printed by the HTTP
request handler
headers: a dictionary storing the headers set by a
HTTP request handler
The server class must be packed together with MyHTTPRequestHandler. Furthermore the implementation is only working properly under the following conditions:
only one HTTP request handler requests an answer from the server at the same time (otherwise data kept by the attributes are corrupted)
MyHTTPRequestHandler is only used with MyHttpServer and vice versa (otherwise unknown side effects like exceptions or data corruption can occur)
That's why both classes must be packed and shipped together in a way like this:
from http.server import BaseHTTPRequestHandler, HTTPServer
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.wfile.write(bytes(self.server.myArg, 'utf-8'))
#...
self.server.headers = self.headers
class MyHttpServer(HTTPServer):
def __init__(self, server_address, myArg, handler_class=MyHttpRequestHandler):
super().__init__(server_address, handler_class)
self.myArg = myArg
self.headers = dict()
The usage of these classes could look like this, whereas only one request of a client (i.e. Web-Browser) is answered by the server:
def get_header():
httpd = MyHttpServer(("127.0.0.1", 8099), "MyFancyText", MyHttpRequestHandler)
httpd.handle_request()
return httpd.headers
Suppose I have the following application:
import cherrypy
class HelloWorld(object):
def my_handler(self, a):
assert a == "foo"
def index(self, a):
# Register parameter a/my_handler/...
return "A" * 100000
index.exposed = True
cherrypy.quickstart(HelloWorld())
And the following client:
import httplib
h = httplib.HTTPConnection('localhost', 8080)
h.request("GET", "/?a=foo")
r = h.getresponse(True)
print r.read(10)
I would like to deal explicitly with the case where a client tears down the connection (either politely or not) and thus hasn't received the response. So, in this example, my_handler should be called because the GET response wasn't delivered before the connection closed.
The CherryPy documentation refers to errors that are a bit higher up the TCP stack than I'm aiming for (c.f. cperror, cprequest's error_response). I dug through some of the CherryPy source, and is some mention of socket errors in e.g. read_request_line and simple response. It is unclear to me whether these are exposed anywhere and, more importantly, if there is a canonical way to hook.
Why, you might ask, would I be interested in such a thing? There's a slight abuse of HTTP semantics and it's important to verify that a GET response was delivered; otherwise some accounting needs to be done.
So i've looked around at a few things involving writting an HTTP Proxy using python and the Twisted framework.
Essentially, like some other questions, I'd like to be able to modify the data that will be sent back to the browser. That is, the browser requests a resource and the proxy will fetch it. Before the resource is returned to the browser, i'd like to be able to modify ANY (HTTP headers AND content) content.
This ( Need help writing a twisted proxy ) was what I initially found. I tried it out, but it didn't work for me. I also found this ( Python Twisted proxy - how to intercept packets ) which i thought would work, however I can only see the HTTP requests from the browser.
I am looking for any advice. Some thoughts I have are to use the ProxyClient and ProxyRequest classes and override the functions, but I read that the Proxy class itself is a combination of the both.
For those who may ask to see some code, it should be noted that I have worked with only the above two examples. Any help is great.
Thanks.
To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com
For two-way proxy using twisted see the article:
http://sujitpal.blogspot.com/2010/03/http-debug-proxy-with-twisted.html
Hello I am working on develop a rpc server based on twisted to serve several microcontrollers which make rpc call to twisted jsonrpc server. But the application also required that server send information to each micro at any time, so the question is how could be a good practice to prevent that the response from a remote jsonrpc call from a micro be confused with a server jsonrpc request which is made for a user.
The consequence that I am having now is that micros are receiving bad information, because they dont know if netstring/json string that is comming from socket is their response from a previous requirement or is a new request from server.
Here is my code:
from twisted.internet import reactor
from txjsonrpc.netstring import jsonrpc
import weakref
creds = {'user1':'pass1','user2':'pass2','user3':'pass3'}
class arduinoRPC(jsonrpc.JSONRPC):
def connectionMade(self):
pass
def jsonrpc_identify(self,username,password,mac):
""" Each client must be authenticated just after to be connected calling this rpc """
if creds.has_key(username):
if creds[username] == password:
authenticated = True
else:
authenticated = False
else:
authenticated = False
if authenticated:
self.factory.clients.append(self)
self.factory.references[mac] = weakref.ref(self)
return {'results':'Authenticated as %s'%username,'error':None}
else:
self.transport.loseConnection()
def jsonrpc_sync_acq(self,data,f):
"""Save into django table data acquired from sensors and send ack to gateway"""
if not (self in self.factory.clients):
self.transport.loseConnection()
print f
return {'results':'synced %s records'%len(data),'error':'null'}
def connectionLost(self, reason):
""" mac address is searched and all reference to self.factory.clientes are erased """
for mac in self.factory.references.keys():
if self.factory.references[mac]() == self:
print 'Connection closed - Mac address: %s'%mac
del self.factory.references[mac]
self.factory.clients.remove(self)
class rpcfactory(jsonrpc.RPCFactory):
protocol = arduinoRPC
def __init__(self, maxLength=1024):
self.maxLength = maxLength
self.subHandlers = {}
self.clients = []
self.references = {}
""" Asynchronous remote calling to micros, simulating random calling from server """
import threading,time,random,netstring,json
class asyncGatewayCalls(threading.Thread):
def __init__(self,rpcfactory):
threading.Thread.__init__(self)
self.rpcfactory = rpcfactory
"""identifiers of each micro/client connected"""
self.remoteMacList = ['12:23:23:23:23:23:23','167:67:67:67:67:67:67','90:90:90:90:90:90:90']
def run(self):
while True:
time.sleep(10)
while True:
""" call to any of three potential micros connected """
mac = self.remoteMacList[random.randrange(0,len(self.remoteMacList))]
if self.rpcfactory.references.has_key(mac):
print 'Calling %s'%mac
proto = self.rpcfactory.references[mac]()
""" requesting echo from selected micro"""
dataToSend = netstring.encode(json.dumps({'method':'echo_from_micro','params':['plop']}))
proto.transport.write(dataToSend)
break
factory = rpcfactory(arduinoRPC)
"""start thread caller"""
r=asyncGatewayCalls(factory)
r.start()
reactor.listenTCP(7080, factory)
print "Micros remote RPC server started"
reactor.run()
You need to add a enough information to each message so that the recipient can determine how to interpret it. Your requirements sounds very similar to those of AMP, so you could either use AMP instead or use the same structure as AMP to identify your messages. Specifically:
In requests, put a particular key - for example, AMP uses "_ask" to identify requests. It also gives these a unique value, which further identifies that request for the lifetime of the connection.
In responses, put a different key - for example, AMP uses "_answer" for this. The value matches up with the value from the "_ask" key in the request the response is for.
Using an approach like this, you just have to look to see whether there is an "_ask" key or an "_answer" key to determine if you've received a new request or a response to a previous request.
On a separate topic, your asyncGatewayCalls class shouldn't be thread-based. There's no apparent reason for it to use threads, and by doing so it is also misusing Twisted APIs in a way which will lead to undefined behavior. Most Twisted APIs can only be used in the thread in which you called reactor.run. The only exception is reactor.callFromThread, which you can use to send a message to the reactor thread from any other thread. asyncGatewayCalls tries to write to a transport, though, which will lead to buffer corruption or arbitrary delays in the data being sent, or perhaps worse things. Instead, you can write asyncGatewayCalls like this:
from twisted.internet.task import LoopingCall
class asyncGatewayCalls(object):
def __init__(self, rpcfactory):
self.rpcfactory = rpcfactory
self.remoteMacList = [...]
def run():
self._call = LoopingCall(self._pokeMicro)
return self._call.start(10)
def _pokeMicro(self):
while True:
mac = self.remoteMacList[...]
if mac in self.rpcfactory.references:
proto = ...
dataToSend = ...
proto.transport.write(dataToSend)
break
factory = ...
r = asyncGatewayCalls(factory)
r.run()
reactor.listenTCP(7080, factory)
reactor.run()
This gives you a single-threaded solution which should have the same behavior as you intended for the original asyncGatewayCalls class. Instead of sleeping in a loop in a thread in order to schedule the calls, though, it uses the reactor's scheduling APIs (via the higher-level LoopingCall class, which schedules things to be called repeatedly) to make sure _pokeMicro gets called every ten seconds.
I am learning network programming using twisted 10 in python. In below code is there any way to detect HTTP Request when data recieved? also retrieve Domain name, Sub Domain, Port values from this? Discard it if its not http data?
from twisted.internet import stdio, reactor, protocol
from twisted.protocols import basic
import re
class DataForwardingProtocol(protocol.Protocol):
def _ _init_ _(self):
self.output = None
self.normalizeNewlines = False
def dataReceived(self, data):
if self.normalizeNewlines:
data = re.sub(r"(\r\n|\n)", "\r\n", data)
if self.output:
self.output.write(data)
class StdioProxyProtocol(DataForwardingProtocol):
def connectionMade(self):
inputForwarder = DataForwardingProtocol( )
inputForwarder.output = self.transport
inputForwarder.normalizeNewlines = True
stdioWrapper = stdio.StandardIO(inputForwarder)
self.output = stdioWrapper
print "Connected to server. Press ctrl-C to close connection."
class StdioProxyFactory(protocol.ClientFactory):
protocol = StdioProxyProtocol
def clientConnectionLost(self, transport, reason):
reactor.stop( )
def clientConnectionFailed(self, transport, reason):
print reason.getErrorMessage( )
reactor.stop( )
if __name__ == '_ _main_ _':
import sys
if not len(sys.argv) == 3:
print "Usage: %s host port" % _ _file_ _
sys.exit(1)
reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))
reactor.run( )
protocol.dataReceived, which you're overriding, is too low-level to serve for the purpose without smart buffering that you're not doing -- per the docs I just quoted,
Called whenever data is received.
Use this method to translate to a
higher-level message. Usually, some
callback will be made upon the receipt
of each complete protocol message.
Parameters
data
a string of
indeterminate length. Please keep in
mind that you will probably need to
buffer some data, as partial (or
multiple) protocol messages may be
received! I recommend that unit tests
for protocols call through to this
method with differing chunk sizes,
down to one byte at a time.
You appear to be completely ignoring this crucial part of the docs.
You could instead use LineReceiver.lineReceived (inheriting from protocols.basic.LineReceiver, of course) to take advantage of the fact that HTTP requests come in "lines" -- you'll still need to join up headers that are being sent as multiple lines, since as this tutorial says:
Header lines beginning with space or
tab are actually part of the previous
header line, folded into multiple
lines for easy reading.
Once you have a nicely formatted/parsed response (consider studying twisted.web's sources so see one way it could be done),
retrieve Domain name, Sub Domain, Port
values from this?
now the Host header (cfr the RFC section 14.23) is the one containing this info.
Just based on what you seems to be attempting, I think the following would be the path of least resistance:
http://twistedmatrix.com/documents/10.0.0/api/twisted.web.proxy.html
That's the twisted class for building an HTTP Proxy. It will let you intercept the requests, look at the destination and look at the sender. You can also look at all the headers and the content going back and forth. You seem to be trying to re-write the HTTP Protocol and Proxy class that twisted has already provided for you. I hope this helps.