Parse raw HTTP request with Werkzeug - python

I'm writing a fuzzer for a Flask application. I have example requests stored as text files, like this get.txt:
GET /docs/index.html HTTP/1.1
Host: www.w3.org
Ideally, I'd parse this to a werkzeug.wrappers.Request object, something like this (psuedo-code):
from werkzeug.wrappers import Request
req = Request()
with open('get.txt') as f:
req.parse_raw(f.read())
However, it looks like the raw HTTP parsing isn't happening in Werkzeug. Instead, Werkzeug takes a WSGI environ from BaseHTTPServer.BaseHTTPRequestHandler, and that requires a BaseHTTPServer.HTTPServer instance to parse the request. That seems like overkill for something this simple.
I also came across http-parser, which is closer to what I want, but it duplicates most of Werkzeug's data structures with incompatible types. I'd have to transform the data from one to another.
Is there a simpler way to go from raw HTTP request to WSGI environ in Werkzeug (or using BaseHTTPRequestHandler without an HTTP server)?

Didn't find an easy way to do this, so I wrote a library called Werkzeug-Raw to parse raw HTTP requests to WSGI environments (or even to open requests on test clients).
It works like this:
from flask import Flask, request
import werkzeug_raw
app = Flask(__name__)
environ = werkzeug_raw.environ('GET /foo/bar?tequila=42 HTTP/1.1')
with app.request_context(environ):
print request.args # ImmutableMultiDict([('tequila', u'42')])
To open a raw HTTP request on a test client:
client = app.test_client()
rv = werkzeug_raw.open(client, 'GET /foo/bar?tequila=42 HTTP/1.1')

Related

Flask Server not responding to HTTP POST Request

I built a simple flask server in order to automate python scripts based on an HTTP POST Request coming in from an outside service. This service sends these requests in response to events that we have created based on data conditions. All the flask server needs to do is parse the requests, and take the numerical value stored in the request and use it to run the corresponding python script. At one point, this system was working consistently. Fast forward a few months, the server was no longer working when tried again. No changes were made to the code. According to wireshark, The computer hosting the flask server is receiving the request to the correct port, but the request is now timing out. The host system is failing to respond to the request. Any Idea what is going on here? The firewall has temporarily been turned off.
Alternatively, is there another better package to achieve this goal with?
from flask import Flask, request
import threading
import runpy
app = Flask(__name__)
#app.route('/', methods=['POST'])
def PostHandler():
directory = {}
with open(r"M:\redacted") as f:
for line in f:
(key,val) = line.split()
directory[int(key)] = val
print(directory)
path = r"M:\redacted"
content = request.json
content = content['value']
print(content)
sel_script = int(content)
print(directory[sel_script])
runpy.run_path(path_name=path + directory[sel_script])
return
app.run(host="10.244.xx.xx", port=8080, threaded=True)

Sending a request from django to itself

I have a Django project that contains a second, third-party Django app. In one of my views, I need to query a view from the other app (within the same Django project).
Currently, it works something like this:
import requests
def my_view(request):
data = requests.get('http://localhost/foo/bar').json()
# mangle data
return some_response
Is there a clean way to send the request without going all the way through DNS and the webserver and just go directly to the other app's view (ideally going through the middleware as well of course)
A Django view function accepts a request and returns a response object. There is no reason that a view cannot invoke another view by constructing a request (or cloning or passing its own) and interpreting the response. (c.f. the testing framework).
Of course, if the other view has undesirable side-effects, then the controlling view will have to unwind them. Working within a transaction should allow it to delve in the results of the view it invoked, and then abort the transaction and perform its own.
You can use urllib as shown below
import urllib, urllib3, urllib.request
url = "http://abcd.com" # API URL
postdata = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url)
with urllib.request.urlopen(req, data=postdata) as response:
resp = response.read()
print(resp)

Serving HTTP client with versioned resource only if it has changed - using Flask

I am running a webserver based on Flask, which serves a resource being versioned (e.g. installation file of some versioned program). I want to serve my HTTP client with new resource only in case, it already does not have the current version available. If there is new version, I want the client to download the resource and install it.
my Flask server looks like this
import json
import redis
import math
import requests
from flask import Flask,render_template,request
app=Flask(__name__)
#app.route('/version', methods=['GET','POST'])
def getversion():
r_server=redis.Redis("127.0.0.1")
if request.method == 'POST':
jsonobj_recieve=request.data
data=json.loads(jsonobj)
currentversion=r_server.hget('version')
if data == currentversion:
#code to return a 'ok'
else:
#code to return 'not ok' also should send the updated file to the client
else:
return r_server.hget('version')
if __name__ == '__main__':
app.run(
debug=True,
host="127.0.0.1",
port=80
)
my client is very basic:
import sys
import json
import requests
url="http://127.0.0.1/version"
jsonobj=json.dumps(str(sys.argv[1]))
print jsonobj
r=requests.post(url,data=jsonobj)
I will likely have to recode the entire client, this is not a problem but I really have no idea where to start....
Requirements Review
have web app, serving a versioned resource. It can be e.g. file with an applications.
have client, which allows fetching the resource only in case, the version of resource on the server and what client has locally already available differ
the client is aware of version string of the resource
allow client to learn new version string if new version is available
HTTP like design of your solution
If you want to allow downloading an application only in case, the client does not have it already, following design could be used:
use etag header. This usually contains some string describing unique status of resource you want to get from that url. In your case it could be current version number of your application.
in your request, use header "if-none-match", providing version number of your application present at client. This will result in HTTP Status code 306 - Not Modified in case, your client and server share the same version of resource. In case it differs, you would simply provide the content of the resource and use it. Your resource shall also denote in etag current version of the resource and your client shall take note of it, or find new version name from other sources (like from the downloaded file).
This design follows HTTP principles.
Flask serving resource with declaring version in etag
This is focusing on showing the principle, you shall elaborate on providing real content of the resource.
from flask import Flask, Response, request
import werkzeug.exceptions
app = Flask(__name__)
class NotModified(werkzeug.exceptions.HTTPException):
code = 304
def get_response(self, environment):
return Response(status=304)
#app.route('/download/app')
def downloadapp():
currver = "1.0"
if request.if_none_match and currver in request.if_none_match:
raise NotModified
def generate():
yield "app_file_part 1"
yield "app_file_part 2"
yield "app_file_part 3"
return Response(generate(), headers={"etag": currver})
if __name__ == '__main__':
app.run(debug=True)
Client getting resource only, if it is new
import requests
ver = "1.0"
url = "http://localhost:5000/download/app"
req = requests.get(url, headers={"If-None-Match": ver})
if req.status_code == 200:
print "new content of resource", req.content
new_ver = req.headers["etag"]
else:
print "resource did not change since last time"
Alternative solution of web part using web server (e.g. NGINX)
Assuming the resource is static file, which updates only sometime, you shall be able configuring your web server, e.g. NGINX, to serve that resource and declaring in your configuration explicit value for etag header to the version string.
Note, that as it was not requested, this alternative solution is not elaborated here (and was not tested).
Client implementation would not be modified by that (here it pays back the design is following HTTP concepts).
There are multiple ways of achieving this but as this is a Flask app, here's one using HTTP.
If the version is OK, just return a relevant status code, like a 200 OK. You can add a JSON response in the body if that's necessary. If you return a string with flask, the status code will be 200 OK and you can inspect that in your client.
If the version differs, return the URL where the file is located. The client will have to
download the file. That's pretty simple using requests. Here's a typical example for downloading file by streaming requests:
def get(url, chunk_size=1024):
""" Download a file in chunks of n bytes """
fn = url.split("/")[-1] # if you're url is complicated, use urlparse.
stream = requests.get(url, stream=True)
with open(fn, "wb") as local:
for chunk in stream.iter_content(chunk_size=chunk_size):
if chunk:
f.write(chunk)
return fn
This is very simplified. If your file is not static and cannot live on the server (like software update patches probably shouldn't) then you'll have to figure out a way to get the file from a database or generate it on the fly.

Dumping HTTP requests with Flask

I am developing a Flask application based web application ( https://github.com/opensourcehacker/sevabot ) which has HTTP based API services.
Many developers are using and extending the API and I'd like to add a feature which prints Flask's HTTP request to Python logging output, so you can see raw HTTP payloads, source IP and headers you get.
What hooks Flask offers where this kind of HTTP request dumping would be the easiest to implement
Are there any existing solutions and best practices to learn from?
Flask makes a standard logger available at at current_app.logger, there's an example configuration in this gist, though you can centralise the logging calls in a before_request handler if you want to log every request:
from flask import request, current_app
#app.before_request
def log_request():
if current_app.config.get('LOG_REQUESTS'):
current_app.logger.debug('whatever')
# Or if you dont want to use a logger, implement
# whatever system you prefer here
# print request.headers
# open(current_app.config['REQUEST_LOG_FILE'], 'w').write('...')

Proxying to another web service with Flask

I want to proxy requests made to my Flask app to another web service running locally on the machine. I'd rather use Flask for this than our higher-level nginx instance so that we can reuse our existing authentication system built into our app. The more we can keep this "single sign on" the better.
Is there an existing module or other code to do this? Trying to bridge the Flask app through to something like httplib or urllib is proving to be a pain.
I spent a good deal of time working on this same thing and eventually found a solution using the requests library that seems to work well. It even handles setting multiple cookies in one response, which took a bit of investigation to figure out. Here's the flask view function:
from dotenv import load_dotenv # pip package python-dotenv
import os
#
from flask import request, Response
import requests # pip package requests
load_dotenv()
API_HOST = os.environ.get('API_HOST'); assert API_HOST, 'Envvar API_HOST is required'
#api.route('/', defaults={'path': ''}) # ref. https://medium.com/#zwork101/making-a-flask-proxy-server-online-in-10-lines-of-code-44b8721bca6
#api.route('/<path>')
def redirect_to_API_HOST(path): #NOTE var :path will be unused as all path we need will be read from :request ie from flask import request
res = requests.request( # ref. https://stackoverflow.com/a/36601467/248616
method = request.method,
url = request.url.replace(request.host_url, f'{API_HOST}/'),
headers = {k:v for k,v in request.headers if k.lower() == 'host'},
data = request.get_data(),
cookies = request.cookies,
allow_redirects = False,
)
#region exlcude some keys in :res response
excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection'] #NOTE we here exclude all "hop-by-hop headers" defined by RFC 2616 section 13.5.1 ref. https://www.rfc-editor.org/rfc/rfc2616#section-13.5.1
headers = [
(k,v) for k,v in res.raw.headers.items()
if k.lower() not in excluded_headers
]
#endregion exlcude some keys in :res response
response = Response(res.content, res.status_code, headers)
return response
Update April 2021: excluded_headers should probably include all "hop-by-hop headers" defined by RFC 2616 section 13.5.1.
I have an implementation of a proxy using httplib in a Werkzeug-based app (as in your case, I needed to use the webapp's authentication and authorization).
Although the Flask docs don't state how to access the HTTP headers, you can use request.headers (see Werkzeug documentation). If you don't need to modify the response, and the headers used by the proxied app are predictable, proxying is staightforward.
Note that if you don't need to modify the response, you should use the werkzeug.wsgi.wrap_file to wrap httplib's response stream. That allows passing of the open OS-level file descriptor to the HTTP server for optimal performance.
My original plan was for the public-facing URL to be something like http://www.example.com/admin/myapp proxying to http://myapp.internal.example.com/. Down that path leads madness.
Most webapps, particularly self-hosted ones, assume that they're going to be running at the root of a HTTP server and do things like reference other files by absolute path. To work around this, you have to rewrite URLs all over the place: Location headers and HTML, JavaScript, and CSS files.
I did write a Flask proxy blueprint which did this, and while it worked well enough for the one webapp I really wanted to proxy, it was not sustainable. It was a big mess of regular expressions.
In the end, I set up a new virtual host in nginx and used its own proxying. Since both were at the root of the host, URL rewriting was mostly unnecessary. (And what little was necessary, nginx's proxy module handled.) The webapp being proxied to does its own authentication which is good enough for now.

Categories