I'm setting up a small Python service to act as an REST API reverse proxy, but hoping there's some libraries available to help speed this process up.
Need to be able to run a function to calculate a variable to inject as a request header when the request is proxied through to the backend.
As it stands I have a simpler script to do the function to get the variable and inject it into a Nginx config file and then force a Nginx hot reload via signals, but trying to remove this dependency for what should be a fairly simple task.
Would a good approach be to use falcon as the listener and combine it with another approach to inject and forward requests?
Thanks for reading.
Edit: Been reading https://aiohttp.readthedocs.io/en/stable/ as it seems to be the right direction.
Thanks to someone over at falcon, this is now the accepted answer!
import io
import falcon
import requests
class Proxy(object):
UPSTREAM = 'https://httpbin.org'
def __init__(self):
self.session = requests.Session()
def handle(self, req, resp):
headers = dict(req.headers, Via='Falcon')
for name in ('HOST', 'CONNECTION', 'REFERER'):
headers.pop(name, None)
request = requests.Request(req.method, self.UPSTREAM + req.path,
data=req.bounded_stream.read(),
headers=headers)
prepared = request.prepare()
from_upstream = self.session.send(prepared, stream=True)
resp.content_type = from_upstream.headers.get('Content-Type',
falcon.MEDIA_HTML)
resp.status = falcon.get_http_status(from_upstream.status_code)
resp.stream = from_upstream.iter_content(io.DEFAULT_BUFFER_SIZE)
api = falcon.API()
api.add_sink(Proxy().handle)
Related
Scope:
I am currently trying to write a Web scraper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.
Problem:
When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.
Code Sample:
I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.
import httplib
# Wrapper for a "HTTP GET" Request
class HttpClient(object):
def HttpGet(self, url, host):
connection = httplib.HTTPConnection(host)
connection.request('GET', url)
return connection.getresponse().read()
# Using "HttpClient" class
httpclient = httpClient()
# This is the full URL I need to make a get request for : https://420101.com/strain-database
httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText
I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.
The problem i've had too little or too much caffeine in my system.
To get a https, I needed the HTTPSConnection class.
Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.
Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.
My Validation:
c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason
200 OK
I apologize if this has been answered elsewhere; I've looked around and haven't found a definitive answer.
I'd like to use Tornado to accept an HTTP request with querystring parameters, use those params to call a NOAA web service to fetch weather data, process/parse the NOAA response, then return the final data to the user.
I'm looking at Tornado because I can't count on the latency or availability of the web service request and I need the calls to be non-blocking. (otherwise I'd just use Django)
I also want to make sure I can set an appropriate timeout on the NOAA request so I can give up as necessary.
Note: I'm also open to using Twisted, though it seems to have a much steeper learning curve and my needs feel pretty simple. (I would do this in Node.js, but I'm much more comfortable handling the parsing requirements in Python)
Thanks in advance for anyone who can help point me in the right direction.
I will open-source the server process once finished and credit anyone who contributes examples or RTFM links to the appropriate documentation.
I've extracted code sample from my project. It's not perfect, but it gives an idea how to use Tornadp's AsyncHTTPClient
#tornado.gen.engine
def async_request(self, callback, server_url, method=u'GET', body=None, **kwargs):
"""
Make async request to server
:param callback: callback to pass results
:param server_url: path to required API
:param method: HTTP method to use, default - GET
:param body: HTTP request body for POST request, default - None
:return: None
"""
args = {}
if kwargs:
args.update(kwargs)
url = '%s?%s' % (server_url, urlencode(args))
request = tornado.httpclient.HTTPRequest(url, method, body=body)
http = tornado.httpclient.AsyncHTTPClient()
response = yield tornado.gen.Task(http.fetch, request)
if response.error:
logging.warning("Error response %s fetching %s", response.error, response.request.url)
callback(None)
return
data = tornado.escape.json_decode(response.body) if response else None
callback(data)
The code below is an HTTP proxy for content filtering. It uses GET to send the URL of the current site to the server, where it processes it and responds. It runs VERY, VERY, VERY slow. Any ideas on how to make it faster?
Here is the code:
from twisted.internet import reactor
from twisted.web import http
from twisted.web.proxy import Proxy, ProxyRequest
from Tkinter import *
#import win32api
import urllib2
import urllib
import os
import webbrowser
cwd = os.path.abspath(sys.argv[0])[0]
proxies = {}
user = "zachb"
class BlockingProxyRequest(ProxyRequest):
def process(self):
params = {}
params['Location']= self.uri
params['User'] = user
params = urllib.urlencode(params)
req = urllib.urlopen("http://weblock.zbrowntechnology.info/ProgFiles/stats.php?%s" % params, proxies=proxies)
resp = req.read()
req.close()
if resp == "allow":
pass
else:
self.transport.write('''BLOCKED BY ADMIN!''')
self.transport.loseConnection()
ProxyRequest.process(self)
class BlockingProxy(Proxy):
requestFactory = BlockingProxyRequest
factory = http.HTTPFactory()
factory.protocol = BlockingProxy
reactor.listenTCP(8000, factory)
reactor.run()
Anyone have any ideas on how to make this run faster? Or even a better way to write it?
The main cause of slowness in this proxy is probably these three lines:
req = urllib.urlopen("http://weblock.zbrowntechnology.info/ProgFiles/stats.php?%s" % params, proxies=proxies)
resp = req.read()
req.close()
A normal Twisted-based application is single threaded. You have to go out of your way to get threads involved. That means that whenever a request comes in, you are blocking the one and only processing thread on this HTTP request. No further requests are processed until this HTTP request completes.
Try using one of the APIs in twisted.web.client, (eg Agent or getPage). These APIs don't block, so your server will handle concurrent requests concurrently. This should translate into much smaller response times.
i need a little help here, since i am new to python, i am trying to do a nice app that can tell me if my website is down or not, then send it to twitter.
class Tweet(webapp.RequestHandler):
def get(self):
import oauth
client = oauth.TwitterClient(TWITTER_CONSUMER_KEY,
TWITTER_CONSUMER_SECRET,
None
)
webstatus = {"status": "this is where the site status need's to be",
"lat": 44.42765100,
"long":26.103172
}
client.make_request('http://twitter.com/statuses/update.json',
token=TWITTER_ACCESS_TOKEN,
secret=TWITTER_ACCESS_TOKEN_SECRET,
additional_params=webstatus,
protected=True,
method='POST'
)
self.response.out.write(webstatus)
def main():
application = webapp.WSGIApplication([('/', Tweet)])
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
now the check website part is missing, so i am extremely new to python and i need a little bit of help
any idea of a function/class that can check a specific url and the answer/error code can be send to twitter using the upper script
and i need a little bit of help at implementing url check in the script above, this is my first time interacting with python.
if you are wondering, upper class uses https://github.com/mikeknapp/AppEngine-OAuth-Library lib
cheers
PS: the url check functionality need's to be based on urlfetch class, more safe for google appengine
You could use Google App Engine URL Fetch API.
The fetch() function returns a Response object containing the HTTP status_code.
Just fetch the url and check the status with something like this:
from google.appengine.api import urlfetch
def is_down(url):
result = urlfetch.fetch(url, method = urlfetch.HEAD)
return result.status_code != 200
Checking if a website exists:
import httplib
from httplib import HTTP
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
return h.getreply()[0] == httplib.OK
We only get the header of a given URL and check the response code of the web server.
Update: The last line is modified according to the remark of Daenyth.
Using urllib2, are we able to use a method other than 'GET' or 'POST' (when data is provided)?
I dug into the library and it seems that the decision to use GET or POST is 'conveniently' tied to whether or not data is provided in the request.
For example, I want to interact with a CouchDB database which requires methods such as 'DEL', 'PUT'. I want the handlers of urllib2, but need to make my own method calls.
I WOULD PREFER NOT to import 3rd party modules into my project, such as the CouchDB python api. So lets please not go down that road. My implementation must use the modules that ship with python 2.6. (My design spec requires the use of a barebones PortablePython distribution). I would write my own interface using httplib before importing external modules.
Thanks so much for the help
You could subclass urllib2.Request like so (untested)
import urllib2
class MyRequest(urllib2.Request):
GET = 'get'
POST = 'post'
PUT = 'put'
DELETE = 'delete'
def __init__(self, url, data=None, headers={},
origin_req_host=None, unverifiable=False, method=None):
urllib2.Request.__init__(self, url, data, headers, origin_req_host, unverifiable)
self.method = method
def get_method(self):
if self.method:
return self.method
return urllib2.Request.get_method(self)
opener = urllib2.build_opener(urllib2.HTTPHandler)
req = MyRequest('http://yourwebsite.com/put/resource/', method=MyRequest.PUT)
resp = opener.open(req)
It could be:
import urllib2
method = 'PATH'
request = urllib2.Request('http://host.com')
request.get_method = lambda: method()
That is, a runtime class modification A.K.A monkey path.