How to handle "413: Request Entity Too Large" in python flask server - python

I'm using Flask-uploads to upload files to my Flask server. The max size allowed is set by using flaskext.uploads.patch_request_class(app, 16 * 1024 * 1024).
My client application (A unit test) uses requests to post a file that is to large.
I can see that my server returnes a HTTP response with status 413: Request Entity Too Large. But the client raises an exception in the requests code
ConnectionError: HTTPConnectionPool(host='api.example.se', port=80): Max retries exceeded with url: /images (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
My guess is that the server disconnect the receving socket and sends the reponse back to the client. But when the client gets a broken sending socket, it raises an exception and skips the response.
Questions:
Are my guess about Flask-Uploads and requests correct?
Does Flask-Uploads and request handle the 413 error correct?
Should I expect that my client code gets back some html when the post are to large?
Update
Here is a simple example reproducing my problem.
server.py
from flask import Flask, request
app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024
#app.route('/post', methods=('POST',))
def view_post():
return request.data
app.run(debug=True)
client.py
from tempfile import NamedTemporaryFile
import requests
def post(size):
print "Post with size %s" % size,
f = NamedTemporaryFile(delete=False, suffix=".jpg")
for i in range(0, size):
f.write("CoDe")
f.close()
# Post
files = {'file': ("tempfile.jpg", open(f.name, 'rb'))}
r = requests.post("http://127.0.0.1:5000/post", files=files)
print "gives status code = %s" % r.status_code
post(16)
post(40845)
post(40846)
result from client
Post with size 16 gives status code = 200
Post with size 40845 gives status code = 413
Post with size 40846
Traceback (most recent call last):
File "client.py", line 18, in <module>
post(40846)
File "client.py", line 13, in post
r = requests.post("http://127.0.0.1:5000/post", files=files)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/sessions.py", line 357, in request
resp = self.send(prep, **send_kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/sessions.py", line 460, in send
r = adapter.send(request, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/adapters.py", line 354, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /post (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
my versions
$ pip freeze
Flask==0.10.1
Flask-Mail==0.9.0
Flask-SQLAlchemy==1.0
Flask-Uploads==0.1.3
Jinja2==2.7.1
MarkupSafe==0.18
MySQL-python==1.2.4
Pillow==2.1.0
SQLAlchemy==0.8.2
Werkzeug==0.9.4
blinker==1.3
itsdangerous==0.23
passlib==1.6.1
python-dateutil==2.1
requests==2.0.0
simplejson==3.3.0
six==1.4.1
virtualenv==1.10.1
voluptuous==0.8.1
wsgiref==0.1.2

Flask is closing the connection, you can set an error handler for the 413 error:
#app.errorhandler(413)
def request_entity_too_large(error):
return 'File Too Large', 413
Now the client should get a 413 error, note that I didn't test this code.
Update:
I tried recreating the 413 error, and I didn't get a ConnectionError exception.
Here's a quick example:
from flask import Flask, request
app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024
#app.route('/post', methods=('POST',))
def view_post():
return request.data
app.run(debug=True)
After running the file, I used the terminal to test requests and sending large data:
>>> import requests
>>> r = requests.post('http://127.0.0.1:5000/post', data={'foo': 'a'})
>>> r
<Response [200]>
>>> r = requests.post('http://127.0.0.1:5000/post', data={'foo': 'a'*10000})
>>> r
<Response [413]>
>>> r.status_code
413
>>> r.content
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>413 Request Entity Too Large</title
>\n<h1>Request Entity Too Large</h1>\n<p>The data value transmitted exceeds the capacity limit.</p>\n'
As you can see, we got a response from flask 413 error and requests didn't raise an exception.
By the way I'm using:
Flask: 0.10.1
Requests: 2.0.0

RFC 2616, the specification for HTTP 1.1, says:
10.4.14 413 Request Entity Too Large
The server is refusing to process a request because the request
entity is larger than the server is willing or able to process. The
server MAY close the connection to prevent the client from continuing
the request.
If the condition is temporary, the server SHOULD include a Retry-
After header field to indicate that it is temporary and after what
time the client MAY try again.
This is what's happening here: flask is closing the connection to prevent the client from continuing the upload, which is giving you the Broken pipe error.

Based on this github issue answers (https://github.com/benoitc/gunicorn/issues/1733#issuecomment-377000612)
#app.before_request
def handle_chunking():
"""
Sets the "wsgi.input_terminated" environment flag, thus enabling
Werkzeug to pass chunked requests as streams. The gunicorn server
should set this, but it's not yet been implemented.
"""
transfer_encoding = request.headers.get("Transfer-Encoding", None)
if transfer_encoding == u"chunked":
request.environ["wsgi.input_terminated"] = True

Related

Send python output (json output) to telegraf over http_listener_v2

I am trying to post test_data output from below python script (abc.py) to telegraf over http_listener_v2.
abc.py
------
import requests
url = "https://testazure.local/telegraf"
test_data = {'key1': 'value1'}
op = requests.post(url, data = test_data)
print(op.test_data)
Here is the snippets from my telegraf.conf file for inputs .
telegraf.conf
-----------------
[[inputs.http_listener_v2]]
# ## Address and port to host HTTP listener on
# methods = ["POST", "PUT"]
data_format = "json"
Not sure , if i am passing all the requsite details to the input plugin .
Getting connection error in an attempt to execute my python script . Any advise /help would be highly appreciated .
Traceback (most recent call last):
File "abc.py", line 6, in <module>
x = requests.post(url, data = test_data)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host'testazure.local', port=443):
Max retries exceeded with url: /telegraf (Caused by
NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9eb8c72438>:
Failed to establish a new connection: [Errno 111] Connection refused',))
I think your url needs http and not https. I have this exact setup and I am looking for an answer as well.
Actually seeing your question, I was able to solve my problem. I had the following code... notice the post method arguments... json
host = "rpi4-2GB"
port = 8080
path = "/telegraf"
weather_dict = data
url_str = "http://{}:{}{}".format(host, port, path)
http_res = requests.post(url=url_str, json=dumps(weather_dict))
>>> http_res.status_code
400
after reading your question here, I changed my kwarg to data
url_str = "http://{}:{}{}".format(host, port, path)
http_res = requests.post(url=url_str, data=dumps(weather_dict))
>>> http_res.status_code
204
204 is the proper value to get back from my experience with telegraph's http_listener_v2 plugin.

Python Requests post times out despite timeout setting

I am using the Python Requests module (v. 2.19.1) with Python 3.4.3, calling a function on a remote server that generates a .csv file for download. In general, it works perfectly. There is one particular file that takes >6 minutes to complete, and no matter what I set the timeout parameter to, I get an error after exactly 5 minutes trying to generate that file.
import requests
s = requests.Session()
authPayload = {'UserName': 'myloginname','Password': 'password'}
loginURL = 'https://myremoteserver.com/login/authenticate'
login = s.post(loginURL, data=authPayload)
backupURL = 'https://myremoteserver.com/directory/jsp/Backup.jsp'
payload = {'command': fileCommand}
headers = {'Connection': 'keep-alive'}
post = s.post(backupURL, data=payload, headers=headers, timeout=None)
This times out after exactly 5 minutes with the error:
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 612, in urlopen
raise MaxRetryError(self, url, e)
urllib3.exceptions.MaxRetryError: > HTTPSConnectionPool(host='myremoteserver.com', port=443): Max retries exceeded with url: /directory/jsp/Backup.jsp (Caused by < class 'http.client.BadStatusLine'>: '')
If I set timeout to something much smaller, say, 5 seconds, I get a error that makes perfect sense:
urllib3.exceptions.ReadTimeoutError:
HTTPSConnectionPool(host='myremoteserver.com', port=443): Read
timed out. (read timeout=5)
If I run the process from a browser, it works fine, so it doesn't seem like it's the remote server closing the connection, or a firewall or something in-between closing the connection.
Posted at the request of the OP -- my comments on the original question pointed to a related SO problem
The clue to the problem lies in the http.client.BadStatusLine error.
Take a look at the following related SO Q & A that discusses the impact of proxy servers on HTTP requests and responses.

How can I make sure a BS4 request is being made with a socket on a list?

I have a list of proxies like this one I would like to use in scraping with python:
proxies_ls = [ '149.56.89.166:3128',
'194.44.176.116:8080',
'14.203.99.67:8080',
'185.87.65.204:63909',
'103.206.161.234:63909',
'110.78.177.100:65103']
and made a function in order to scrap a url using bs4 and requests module called crawlSite(url). Here's the code:
# Bibliotecas para crawl e regex
from bs4 import BeautifulSoup
import requests
from fake_useragent import UserAgent
import re
#Biblioteca para data
import datetime
from time import gmtime, strftime
#Biblioteca para escrita dos logs
import os
import errno
#Biblioteca para delay aleatorio
import time
import random
print('BOT iniciado: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
proxies_ls = [ '149.56.89.166:3128',
'194.44.176.116:8080',
'14.203.99.67:8080',
'185.87.65.204:63909',
'103.206.161.234:63909',
'110.78.177.100:65103']
def crawlSite(url):
#Chrome emulation
ua=UserAgent()
header={'user-agent':ua.chrome}
random.shuffle(proxies_ls)
#Random delay
print('antes do delay: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
tempoRandom=random.randint(1,5)
time.sleep(tempoRandom)
try:
randProxy=random.choice(proxies_ls)
# Getting the webpage, creating a Response object emulated with chrome with a 30sec timeout.
response = requests.get(url,proxies = {'https':randProxy},headers=header,timeout=30)
print(response)
print('Resposta obtida: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
#Avoid HTTP request errors
if response.status_code == 404:
raise ConnectionError("HTTP Response [404] - The requested resource could not be found")
elif response.status_code == 409:
raise ConnectionError("HTTP Response [409] - Possible Cloudflare DNS resolution error")
elif response.status_code == 403:
raise ConnectionError("HTTP Response [403] - Permission denied error")
elif response.status_code == 503:
raise ConnectionError("HTTP Response [503] - Service unavailable error")
print('RR Status {}'.format(response.status_code))
# Extracting the source code of the page.
data = response.text
except ConnectionError:
try:
proxies_ls.remove(randProxy)
except ValueError:
pass
randProxy=random.choice(proxies_ls)
return BeautifulSoup(data, 'lxml')
What I would like to do is to make sure only the proxies on that list are being used in the conection.
The random part
randProxy=random.choice(proxies_ls)
is working ok but the checking part if the proxy is valid or not isn't. Mainly because I still get 200 as response with a "made up proxy".
If I reduce the list to this:
proxies_ls = ['149.56.89.166:3128']
with a proxy that doesn't work I still get 200 as response! (I tried using a proxychecker like https://pt.infobyip.com/proxychecker.php and it doesn't work...)
So my questions are (I'll enumerate so it is easier):
a) Why am I getting this 200 response and not a 4xx response?
b) How can I force the request to use the proxies as I want?
Thank you,
Eunito.
Read the docs carefully, you have to specify in the dictionary the following things:
http://docs.python-requests.org/en/master/user/advanced/#proxies
What protocol to use the proxy for
What protocol the proxy uses
The address and port of the proxy
A "working" dict should look as follows:
proxies = {
'https': 'socks5://localhost:9050'
}
This will proxy ONLY and ALL https requests. This means it will NOT proxy http.
So to proxy all webtraffic, you should configure your dict like so:
proxies = {
'https': 'socks5://localhost:9050'
'http': 'socks5://localhost:9050'
}
and substitute IP-addresses where necessary, of course. See the following example for what happens otherwise:
$ python
>>> import requests
>>> proxies = {'https':'http://149.58.89.166:3128'}
>>> # Get a HTTP page (this goes around the proxy)
>>> response = requests.get("http://www.example.com/",proxies=proxies)
>>> response.status_code
200
>>> # Get a HTTPS page (so it goes through the proxy)
>>> response = requests.get("https://www.example.com/", proxies=proxies)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 485, in send
raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.example.com', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f7d1f448c10>: Failed to establish a new connection: [Errno 110] Connection timed out',)))
So basically, if I got your question right, you just want to check if the proxy is valid or not. requests has an exception handler for that, you can do something like this:
from requests.exceptions import ProxyError
try:
response = requests.get(url,proxies = {'https':randProxy},headers=header,timeout=30)
except ProxyError:
# message proxy is invalid

django send request to self

There is two tries to get response from "working" django server. Working version is hardcoded and not working while unittesting
# working
# a = requests.post('http://localhost:8000/ImportKeys/',
# data=json.dumps({'user_id': key_obj.email,
#'key': self.restore_pubkey(key_obj.fingerprint)}))
# not working
a = requests.post('http://' + request.get_host() + reverse('import_keys'),data=json.dumps({'user_id': key_obj.email,'key': self.restore_pubkey(key_obj.fingerprint)}))
On that version, that I whant to starts working, I've got this(end stacktrace):
File "/home/PycharmProjects/lib/python3.4/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/home/PycharmProjects/lib/python3.4/site-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='testserver', port=80): Max retries exceeded with url: /ImportKeys/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))
And yes, I see that it's trying connect to 80 port, and this is bad.
To test your views in the TestCase classes, use django.test.Client, which is designed specifically for that purpose. If you inherit your test cases from django.test.TestCase, it's already available via the self.client attribute.
class YourTestCase(TestCase):
def test_import_keys_posting(self):
data = {
'user_id': key_obj.email,
'key': self.restore_pubkey(key_obj.fingerprint)
}
response = self.client.post(reverse('import_keys'), data)
self.assertEqual(response.status_code, 200)
self.assertEqual(response.json(), {'result': 'ok'})
And if you use Django Rest Framework, consider using its wonderful APIClient, which simplifies API testing even more.
If you need to send requests to the server during tests (in that case this will probably not be from the test code itself but from some mock or from JS code):
Extend LiveServerTestCase instead TestCase. This will launch an actual server during tests.
If you are using request.build_absolute_uri() in your regular code which is being tested, you need to change the test code to update the HTTP request headers accordingly like this:
checkout_url = '{}{}'.format(self.live_server_url, reverse('checkout', kwargs={'pk': article.id}))
parsed_url = parse.urlparse(self.live_server_url)
# add the info on host and port to the http header to make subsequent
# request.build_absolute_uri() calls work
response = self.client.get(checkout_url, SERVER_NAME=parsed_url.hostname, SERVER_PORT=parsed_url.port)

Seeing retry of a request sent using urllib3.PoolManager without retries configured

I have some python code that looks like the following:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE')
...
full_url = 'https://[%s]:%d%s%s' % \
(address, port, base_uri, relative_uri)
kwargs = {
'headers': {
'Host': '%s:%d' % (hostname, port)
}
}
if data is not None:
kwargs['body'] = json.dumps(data, indent=2, sort_keys=True)
# Directly use request_encode_url instead of request because requests
# will try to encode the body as 'multipart/form-data'.
response = http.request_encode_url('POST', full_url, **kwargs)
log.debug('Received response: HTTP status %d. Body: %s' %
(response.status, repr(response.data)))
I have a log line that prints once prior to the code that issues the request, and the log.debug('Received...') line prints once. However, on the server side, I occasionally see two requests (they are both the same POST request that is sent by this code block), around 1-5 seconds apart. In such instances, the order of events is as follows:
One request sent from python client
First request received
Second request received
First response sent with status 200 and an http entity indicating success
Second response sent with status 200 and http entity indicating failure
Python client receives the second reponse
I tried to reproduce it reliably by sleeping in the server (guessing that there might be a timeout that causes a retry), but was unsuccessful. I believe the duplication is unlikely to be occurring on the server because it's just a basic Scala Spray server and haven't seen this with other clients. Looking at the source code for PoolManager, I can't find anywhere where retries would be included. There is a mechanism for retries specified with an optional parameter, but this optional parameter is not being used in the code above.
Does anyone have any ideas where this extra request might be coming from?
EDIT: #shazow gave a pointer about retries having a default of 3, but I changed the code as suggested and got the following error:
Traceback (most recent call last):
File "my_file.py", line 23, in <module>
response = http.request_encode_url('GET', full_url, **kwargs)
File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 88, in request_encode_url
return self.urlopen(method, url, **urlopen_kw)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 145, in urlopen
conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 119, in connection_from_host
pool = self._new_pool(scheme, host, port)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 86, in _new_pool
return pool_cls(host, port, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'retries'`
Edit #2: The following change to kwargs seems to work for me:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE')
...
full_url = 'https://[%s]:%d%s%s' % \
(address, port, base_uri, relative_uri)
kwargs = {
'headers': {
'Host': '%s:%d' % (hostname, port)
},
'retries': 0
}
if data is not None:
kwargs['body'] = json.dumps(data, indent=2, sort_keys=True)
# Directly use request_encode_url instead of request because requests
# will try to encode the body as 'multipart/form-data'.
response = http.request_encode_url('POST', full_url, **kwargs)
log.debug('Received response: HTTP status %d. Body: %s' %
(response.status, repr(response.data)))
urllib3 has a default retries configuration, which is the equivalent to Retry(3). To disable retries outright, you'll need to pass retries=False either when constructing the pool or making a request.
Something like this should work, for example:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE', retries=False)
...
The default retries setting (as defined here) could definitely be better documented, I would appreciate your contribution if you feel up for it. :)

Categories