Restricting POST requests to a maximum size on Pyramid - python

I am writing a web application with Pyramid and would like to restrict the maximum length for POST requests, so that people can't post huge amount of data and exhaust all the memory on the server. However I looked pretty much everywhere I could think of (Pyramid, WebOb, Paster) and couldn't find any option to accomplish this. I've seen that Paster has limits for the number of HTTP headers, length each header, etc., but I didn't see anything for the size of the request body.
The server will be accepting POST requests only for JSON-RPC, so I don't need to allow huge request body sizes. Is there a way in the Pyramid stack of accomplishing this?
Just in case this is not obvious from the rest, a solution which has to accept and load the whole request body into memory before checking the length and returning a 4xx error code defeats the purpose of what's I'm trying to do, and is not what I'm looking for.

Not really a direct answer to your question. As far as I know, you can create a wsgi app that will load the request if the body is below the configuration setting you can pass it to the next WSGI layer. If it goes above you can stop reading and return an error directly.
But to be honest, I really don't see the point to do it in pyramid. For example, if you run pyramid behind a reverse proxy with nginx or apache or something else.. you can always limit the size of the request with the frontend server.
unless you want to run pyramid with Waitress or Paster directly without any proxy, you should handle body size in the front end server that should be more efficient than python.
Edit
I did some research, it isn't a complete answer but here is something that can be used I guess. You have to read environ['wsgi_input'] as far as I can tell. This is a file like object that receives chunk of data from nginx or apache for example.
What you really have to do is read that file until max lenght is reached. If it is reached raise an Error if it isn't continue the request.
You might want to have a look at this answer

You can do it in a variety of ways here's a couple of examples. one using wsgi middleware based on webob(installed when you install pyramid among other things). and one that uses pyramids event mechanism
"""
restricting execution based on request body size
"""
from pyramid.config import Configurator
from pyramid.view import view_config
from pyramid.events import NewRequest, subscriber
from webob import Response, Request
from webob.exc import HTTPBadRequest
import unittest
def restrict_body_middleware(app, max_size=0):
"""
this is straight wsgi middleware and in this case only depends on
webob. this can be used with any wsgi compliant web
framework(which is pretty much all of them)
"""
def m(environ, start_response):
r = Request(environ)
if r.content_length <= max_size:
return r.get_response(app)(environ, start_response)
else:
err_body = """
request content_length(%s) exceeds
the configured maximum content_length allowed(%s)
""" % (r.content_length, max_size)
res = HTTPBadRequest(err_body)
return res(environ, start_response)
return m
def new_request_restrict(event):
"""
pyramid event handler called whenever there is a new request
recieved
http://docs.pylonsproject.org/projects/pyramid/en/1.2-branch/narr/events.html
"""
request = event.request
if request.content_length >= 0:
raise HTTPBadRequest("too big")
#view_config()
def index(request):
return Response("HI THERE")
def make_application():
"""
make appplication with one view
"""
config = Configurator()
config.scan()
return config.make_wsgi_app()
def make_application_with_event():
"""
make application with one view and one event subsriber subscribed
to NewRequest
"""
config = Configurator()
config.add_subscriber(new_request_restrict, NewRequest)
return config.make_wsgi_app()
def make_application_with_middleware():
"""
make application with one view wrapped in wsgi middleware
"""
return restrict_body_middleware(make_application())
class TestWSGIApplication(unittest.TestCase):
def testNoRestriction(self):
app = make_application()
request = Request.blank("/", body="i am a request with a body")
self.assert_(request.content_length > 0, "content_length should be > 0")
response = request.get_response(app)
self.assert_(response.status_int == 200, "expected status code 200 got %s" % response.status_int)
def testRestrictedByMiddleware(self):
app = make_application_with_middleware()
request = Request.blank("/", body="i am a request with a body")
self.assert_(request.content_length > 0, "content_length should be > 0")
response = request.get_response(app)
self.assert_(response.status_int == 400, "expected status code 400 got %s" % response.status_int)
def testRestrictedByEvent(self):
app = make_application_with_event()
request = Request.blank("/", body="i am a request with a body")
self.assert_(request.content_length > 0, "content_length should be > 0")
response = request.get_response(app)
self.assert_(response.status_int == 400, "expected status code 400 got %s" % response.status_int)
if __name__ == "__main__":
unittest.main()

Related

How to pass parameters to a FastAPI middleware and influence its processing logic?

we know that through request.state we can pass in some custom data to handlers from middleware's before-handle process and thus influence its behaviour, I'm currently wondering how can a handler influence the middleware's logic after it.
My specific business scenario is that I have a routed address (let's take /api for example) whose function is to calculate a dynamic result that will consume a relatively long time and return a relatively large json response. Operationally, an effective way to improve efficiency is to use some buffer (like redis) to cache its results, in this way, the calculation time can be saved everytime the cache hits.
Since my memory is limited, I want to store the gzip-compressed bytes in a buffer instead of raw json streams, this will greatly increase the amount of cache I could handle. Specifically, because of there's a large number of numbers in the response, a response content is usually about 20MB in size without gzip, while after compression it is about only 1MB. This means that with 1GB of memory, I can only cache 50 different responses without compression, while with compression I can cache 1000, which is a significant difference that can hardly be ignored.
Because of these requirements, I wanted to implement a full-featured gzip middleware, but there were several technical confusions. The first is that I want to control the middleware whether to compress or not, apparently, if the response is not hitting the cache but is generated dynamically then it should be compressed, but on the contrary, it should not be compressed again since it has already been compressed once. The second question is, even if I can control the middleware not to compress, how do I replace its results with bytes that have already been compressed when there's no need to run the compress logic?
Since I yet don't know how to implement, forgive me that I can only provide some pseudo-code to illustrate my thinking.
The following code describes a relatively complex response address that does not contain middleware:
from fastapi import FastAPI, Request
import uvicorn
app = FastAPI()
#app.post("/api")
async def root(request :Request, some_args: int):
# Using the Fibonacci series to simulate a time-consuming
# computational operation
def fib_recur(n):
if n <= 1: return n
return fib_recur(n-1) + fib_recur(n-2)
# In addition to the time-consuming calculation, the size of
# the returned content is also large.
return {"response_content": [fib_recur(31)] * 10000000 }
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8080)
I want to achieve the following after adding middleware:
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from utils import fib_recur, search_for_if_cache_hit
import uvicorn
app = FastAPI()
#app.post("/api")
async def root(request :Request, some_args: int):
cache_hit_flag , cache_content = search_for_if_cache_hit('/api' , some_args)
if cache_hit_flag == True:
# Skip Calculation
response = SomeKindOfResponse(message_content = cache_content)
response.store['need_gzip'] = False
else:
# Calculate normally
response = JSONResponse( {"response_content": [fib_recur(31)] * 10000000 } )
response.store['need_gzip'] = True
return response
#app.middleware("http")
async def gzip_middleware(request: Request, call_next):
response = await call_next(request)
if response.store['need_gzip'] == True:
response = gzip_handler(response)
else:
pass
return response
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8080)
Thanks!

mod-wsgi hangs during posting with https and large data packages

I created a https environment in win10 with apache24 + openssl(1.1.1) + mod-wsgi(4.5.24+ap24vc14).
It works well for http posting (no matter how big of the posting data package) but I met a problem for https posting.
For https posting:
when the client and the server are the same local machine, also works well no matter how big of the posting data package.
when the client is a different machine in the same domain, it also works well for small or medium posting data packages, maybe less than 3M, no precise number.
when the client is a different machine in the same domain and posting relatively big data packages, about 5M or 6M, after initial several successful posting, program hangs at server body=environ['wsgi.input'].read(length), no response and no error (seldomly it will pass successfully after a long time, but mostly it willl hang until the connection time out).
when debugging the client and the server, the runtime values of length are both correct and the same.
it seems body=environ['wsgi.input'].read(length) comes from sys.stdin.buffer.read(length), but I still can't find the root reason and a solution.
Client code:
import json
import requests
import base64
import requests.packages.urllib3.util.ssl_
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ALL'
url="https://192.168.0.86"
# url="http://192.168.0.86"
f_img=open("./PICs/20191024142412.jpg",'rb')
# f_img=open("./PICs/20191023092645.jpg",'rb')
json_data={'type':'idpic','image':str(base64.b64encode(f_img.read()),'utf-8')}
result = requests.post(url,json=json_data,verify=False)
result_data=json.loads(result.content)
print(result_data)
Part of server codes:
class WSGICopyBody(object):
def __init__(self, application):
self.application = application
def __call__(self, environ, start_response):
from io import StringIO, BytesIO
length = environ.get('CONTENT_LENGTH', '0')
length = 0 if length == '' else int(length)
body = environ['wsgi.input'].read(length)
environ['body_copy'] = body
environ['wsgi.input'] = BytesIO(body)
app_iter = self.application(environ,self._sr_callback(start_response))
return app_iter
def _sr_callback(self, start_response):
def callback(status, headers, exc_info=None):
start_response(status, headers, exc_info)
return callback
app = Flask(__name__)
app.wsgi_app = WSGICopyBody(app.wsgi_app)
#app.route('/',methods=['POST'])
#app.route('/picserver',methods=['POST'])
def picserver():
print("before request.get_data")
request_json_data = request.environ['body_copy']

Flask/Werkzeug equivalence of web.py app.request()

I'm porting an app from web.py to Flask, mainly because web.py support for Python 3 is spotty and there seems to be less and less interest in web.py.
But what I can't find in Flask/Werkzeug is a way to use the router to do dispatching of internal requests within my application. The app is structured such that there will be a lot of intra-application calls, and in web.py I handle these more or less as follows:
app = web.application(....)
def callUrl(url, method, env, data):
parserUrl = url lib.parse.urlparse(url)
if parsedUrl.scheme == '' and parsedUrl.netloc == '':
# local call
res = app.request(url, method=method, data=data, env=env)
...
else:
assert env == {}
res = requests.request(url, method=method, data=data)
....
I am trying to find a way to do something similar with Flask, but apparently I am looking in the wrong places. Can someone point me in the right direction?
Ok, answering my own question. The solution I chose was to basically re-implement app.request from web.py by filling an environ dictionary with all the required WSGI variables (REQUEST_METHOD, PATH_INFO, etc), including wsgi.input as an io.BytesIO() object that feeds the correct data into the WSGI app.
Then I created a suitable start_response() method to save the headers, and called
resultData = app.wsgi_app(environ, start_response)
The flask app goes through all the motions of pushing requests and environments and does all the routing, and I get the returned data back in resultData (and the headers with any errors and such have been passed to my start_response method).

In Python, how to keep track of variable in the response of a REST API for changes?

I have a REST API, which returns a JSON response. I need to keep track of one of the fields of the response, and listen for any changes in the value of this field. If the value reaches a certain threshold, I need to perform some task (say print an alert message). How can I accomplish this? Right now, I have a daemon which runs periodically, making an HTTP request and obtaining the value. What is the correct approach to do this, if I want to perform the action the moment the variable reaches the threshold?
This is what I currently have -
import daemon, time
from daemon import runner
NUMBER_OF_MINUTES = 10
def doSomething():
print "Yay, we got there"
def getSomeData():
url = "www.somewebsite.com/getdata?id=somevalue&name=someothervalue"
response = requests.get(url)
json_data = response.json()
myField = json_data['somefield']
if myField > threshold:
doSomething()
def run()
while True:
getSomeData()
time.sleep(60*NUMBER_OF_MINUTES)
if __name__ == '__main__':
run()
Do you have control of the API? If so, add a websocket endpoint that your frontend app can connect to. Your API can then let your frontend app know whenever the value changes, through whatever data structure is appropriate.
If you don't have control of the API, your current polling solution is about as good as it gets.

Can a WSGI middleware modify a request body and then pass it along?

I'm trying to write a middleware that rewrites POST requests to a different method when it finds a "_method" parameter in the body. Someone on the Internet wrote this piece of code:
from werkzeug import Request
class MethodRewriteMiddleware(object):
def __init__(self, app, input_name='_method'):
self.app = app
self.input_name = input_name
def __call__(self, environ, start_response):
request = Request(environ)
if self.input_name in request.form:
method = request.form[self.input_name].upper()
if method in ['GET', 'POST', 'PUT', 'DELETE']:
environ['REQUEST_METHOD'] = method
return self.app(environ, start_response)
As I understand the code, it parses the form, retrieves a possible "_method" parameter and if it's found and whitelisted it will overwrite the current method. It works fine for DELETE requests and it rewrites the method with no problems. However, when I try to send a regular, non-rewritten POST this middleware makes the whole app hang. My best guess is that since I accessed the body in the middleware, the body is no longer available for the application so it hangs forever. However, this doesn't seem to impact rewritten requests, so the code deepest code path (checking the whitelist) works correctly but the other code path is somehow destroying/blocking the request.
I don't think it's relevant, but I'm mounting this middleware on top of a Flask app.
EDIT: I think trying to access the request from a handler in Flask is blocking. Does Flask use mutexes or something like that internally?
I'm not even sure how to debug this.
environ is a dictionary with a wsgi.input key that is a stream object. The manual way to do what you're looking for is to read the data, make any changes and then create a new string to replace the old stream in the environ dictionary. The example below (without any error handling or other important things) allows you to work with the request body:
def __call__(self, environ, start_response):
body = environ['wsgi.input'].read()
... do stuff to body ...
new_stream = io.ByteIO(modified_body)
environ['wsgi.input'] = new_stream
return self.app(environ, start_response)

Categories