Basic HTTP Parsing Using Twisted - python

I am a newcomer to the Python and Twisted game so excuse the ignorance I will likely be asking this question with. As a sort of first program, I am trying to write a basic HTTP server using twisted.web.sever which would simply print to screen the HTTP request, and then print to screen the HTTP response. I am trying to print the entire message. Here is what I have so far:
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.resource import Resource
import time
class TestPage(Resource):
isLeaf = True
def render_GET(self, request):
response = "Success"
print "You're request was %s" % request
print "The sever's response was %s" % response
return response
resource = TestPage()
factory = Site(resource)
reactor.listenTCP(8000, factory)
reactor.run()
So far, I am having success printing the request. What I want to know is where I can access the raw response data, not just the textual message. Also, if I wanted to start parsing the request/response for information, what would be the best way to go about doing that?
Edit: I'm also new to stackoverflow, how do I get this code to display properly?

Take a look at the Request and IRequest API docs to get an idea of what that request parameter offers you. You should be able to find just about everything in the request there.
I'm not sure what you mean by raw response data though. The response is up to you to generate.

Related

HTTP Get Request "Moved Permanently" using HttpLib

Scope:
I am currently trying to write a Web scraper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.
Problem:
When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.
Code Sample:
I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.
import httplib
# Wrapper for a "HTTP GET" Request
class HttpClient(object):
def HttpGet(self, url, host):
connection = httplib.HTTPConnection(host)
connection.request('GET', url)
return connection.getresponse().read()
# Using "HttpClient" class
httpclient = httpClient()
# This is the full URL I need to make a get request for : https://420101.com/strain-database
httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText
I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.
The problem i've had too little or too much caffeine in my system.
To get a https, I needed the HTTPSConnection class.
Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.
Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.
My Validation:
c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason
200 OK

Using urllib2.urlopen() to load a URL without waiting for a response

I just want to send the request but don't want to waste time waiting for the responses. Because the responses are useless to me.
I looked up the python doc but I didn't find a solution.
Thanks for any advice.
I have tried to use
urllib2.urlopen(url, timeout=0.02)
But I can't sure if the request is actually sent out.
This is called asynchronous loading, and here's a blog post explaining how to do it with urllib2. Sample code:
#!/usr/bin/env python
import urllib2
import threading
class MyHandler(urllib2.HTTPHandler):
def http_response(self, req, response):
print "url: %s" % (response.geturl(),)
print "info: %s" % (response.info(),)
for l in response:
print l
return response
o = urllib2.build_opener(MyHandler())
t = threading.Thread(target=o.open, args=('http://www.google.com/',))
t.start()
print "I'm asynchronous!"
This will load the URL without waiting for a response.
The easiest way to achieve asynchronous function calling like that is by using the callable in its own thread or subprocess.
http://docs.python.org/library/threading.html#threading.Thread
http://docs.python.org/library/multiprocessing.html#multiprocessing.Process
It appears that there are HTTP libraries for Python that already implement such functionality, however, such as grequests or requests-futures (both of which are based on the requests library, an improvement over urllib2 in terms of API).

Non-blocking/async URL fetching in Tornado request

I apologize if this has been answered elsewhere; I've looked around and haven't found a definitive answer.
I'd like to use Tornado to accept an HTTP request with querystring parameters, use those params to call a NOAA web service to fetch weather data, process/parse the NOAA response, then return the final data to the user.
I'm looking at Tornado because I can't count on the latency or availability of the web service request and I need the calls to be non-blocking. (otherwise I'd just use Django)
I also want to make sure I can set an appropriate timeout on the NOAA request so I can give up as necessary.
Note: I'm also open to using Twisted, though it seems to have a much steeper learning curve and my needs feel pretty simple. (I would do this in Node.js, but I'm much more comfortable handling the parsing requirements in Python)
Thanks in advance for anyone who can help point me in the right direction.
I will open-source the server process once finished and credit anyone who contributes examples or RTFM links to the appropriate documentation.
I've extracted code sample from my project. It's not perfect, but it gives an idea how to use Tornadp's AsyncHTTPClient
#tornado.gen.engine
def async_request(self, callback, server_url, method=u'GET', body=None, **kwargs):
"""
Make async request to server
:param callback: callback to pass results
:param server_url: path to required API
:param method: HTTP method to use, default - GET
:param body: HTTP request body for POST request, default - None
:return: None
"""
args = {}
if kwargs:
args.update(kwargs)
url = '%s?%s' % (server_url, urlencode(args))
request = tornado.httpclient.HTTPRequest(url, method, body=body)
http = tornado.httpclient.AsyncHTTPClient()
response = yield tornado.gen.Task(http.fetch, request)
if response.error:
logging.warning("Error response %s fetching %s", response.error, response.request.url)
callback(None)
return
data = tornado.escape.json_decode(response.body) if response else None
callback(data)

Get POST data from WebKitNetworkRequest

I am trying to send data from a javascript app running in GTK webkit to Python via a HTTP request with the data sent in POST.
I can capture the request using resource-request-starting and checking the uri of the request.
I know the request works because I can send data through the request headers and view it with
def on_resource_request_starting(view, frame, resource, request, response):
uri = urllib.unquote(request.props.uri)
if uri.startswith('http://appname.local/'):
print request.get_message().request_headers.get_one('foobar')
But when I use print request.get_message().request_body.data I don't get anything.
How do I view the POST data?
I haven't used this API, but I believe you need to call the binding's equivalent of soup_message_body_flatten() before reading the data field. See the documentation for SoupMessageBody in the C API.
So, at a guess:
print request.get_message().request_body.flatten().data
Hooking to SoupSession "request-queued" signal and getting
buffer(s) using soup_message_body_get_chunk(soupmsgbody, num);
seems to work (in webkitgtk1 today, Jun 2015).
webkit_get_default_session() returns the SoupSession in question.

Python httplib POST request and proper formatting

I'm currently working on a automated way to interface with a database website that has RESTful webservices installed. I am having issues with figure out the proper formatting of how to properly send the requests listed in the following site using python.
https://neesws.neeshub.org:9443/nees.html
Particular example is this:
POST https://neesws.neeshub.org:9443/REST/Project/731/Experiment/1706/Organization
<Organization id="167"/>
The biggest problem is that I do not know where to put the XML formatted part of the above. I want to send the above as a python HTTPS request and so far I've been trying something of the following structure.
>>>import httplib
>>>conn = httplib.HTTPSConnection("neesws.neeshub.org:9443")
>>>conn.request("POST", "/REST/Project/731/Experiment/1706/Organization")
>>>conn.send('<Organization id="167"/>')
But this appears to be completely wrong. I've never actually done python when it comes to webservices interfaces so my primary question is how exactly am I supposed to use httplib to send the POST Request, particularly the XML formatted part of it? Any help is appreciated.
You need to set some request headers before sending data. For example, content-type to 'text/xml'. Checkout the few examples,
Post-XML-Python-1
Which has this code as example:
import sys, httplib
HOST = www.example.com
API_URL = /your/api/url
def do_request(xml_location):
"""HTTP XML Post requeste"""
request = open(xml_location,"r").read()
webservice = httplib.HTTP(HOST)
webservice.putrequest("POST", API_URL)
webservice.putheader("Host", HOST)
webservice.putheader("User-Agent","Python post")
webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"")
webservice.putheader("Content-length", "%d" % len(request))
webservice.endheaders()
webservice.send(request)
statuscode, statusmessage, header = webservice.getreply()
result = webservice.getfile().read()
print statuscode, statusmessage, header
print result
do_request("myfile.xml")
Post-XML-Python-2
You may get some idea.

Categories