Seeing retry of a request sent using urllib3.PoolManager without retries configured

Seeing retry of a request sent using urllib3.PoolManager without retries configured - python

I have some python code that looks like the following:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE')
...
full_url = 'https://[%s]:%d%s%s' % \
(address, port, base_uri, relative_uri)
kwargs = {
'headers': {
'Host': '%s:%d' % (hostname, port)
}
}
if data is not None:
kwargs['body'] = json.dumps(data, indent=2, sort_keys=True)
# Directly use request_encode_url instead of request because requests
# will try to encode the body as 'multipart/form-data'.
response = http.request_encode_url('POST', full_url, **kwargs)
log.debug('Received response: HTTP status %d. Body: %s' %
(response.status, repr(response.data)))
I have a log line that prints once prior to the code that issues the request, and the log.debug('Received...') line prints once. However, on the server side, I occasionally see two requests (they are both the same POST request that is sent by this code block), around 1-5 seconds apart. In such instances, the order of events is as follows:
One request sent from python client
First request received
Second request received
First response sent with status 200 and an http entity indicating success
Second response sent with status 200 and http entity indicating failure
Python client receives the second reponse
I tried to reproduce it reliably by sleeping in the server (guessing that there might be a timeout that causes a retry), but was unsuccessful. I believe the duplication is unlikely to be occurring on the server because it's just a basic Scala Spray server and haven't seen this with other clients. Looking at the source code for PoolManager, I can't find anywhere where retries would be included. There is a mechanism for retries specified with an optional parameter, but this optional parameter is not being used in the code above.
Does anyone have any ideas where this extra request might be coming from?
EDIT: #shazow gave a pointer about retries having a default of 3, but I changed the code as suggested and got the following error:
Traceback (most recent call last):
File "my_file.py", line 23, in <module>
response = http.request_encode_url('GET', full_url, **kwargs)
File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 88, in request_encode_url
return self.urlopen(method, url, **urlopen_kw)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 145, in urlopen
conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 119, in connection_from_host
pool = self._new_pool(scheme, host, port)
File "/usr/lib/python2.7/dist-packages/urllib3/poolmanager.py", line 86, in _new_pool
return pool_cls(host, port, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'retries'`
Edit #2: The following change to kwargs seems to work for me:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE')
...
full_url = 'https://[%s]:%d%s%s' % \
(address, port, base_uri, relative_uri)
kwargs = {
'headers': {
'Host': '%s:%d' % (hostname, port)
},
'retries': 0
}
if data is not None:
kwargs['body'] = json.dumps(data, indent=2, sort_keys=True)
# Directly use request_encode_url instead of request because requests
# will try to encode the body as 'multipart/form-data'.
response = http.request_encode_url('POST', full_url, **kwargs)
log.debug('Received response: HTTP status %d. Body: %s' %
(response.status, repr(response.data)))

urllib3 has a default retries configuration, which is the equivalent to Retry(3). To disable retries outright, you'll need to pass retries=False either when constructing the pool or making a request.
Something like this should work, for example:
import urllib3
http = urllib3.PoolManager(cert_reqs='CERT_NONE', retries=False)
...
The default retries setting (as defined here) could definitely be better documented, I would appreciate your contribution if you feel up for it. :)

Related

Python HTTPS request SSLError CERTIFICATE_VERIFY_FAILED

PYTHON
import requests
url = "https://REDACTED/pb/s/api/auth/login"
r = requests.post(
url,
data = {
'username': 'username',
'password': 'password'
}
)
NIM
import httpclient, json
let client = newHttpClient()
client.headers = newHttpHeaders({ "Content-Type": "application/json" })
let body = %*{
"username": "username",
"password": "password"
}
let resp = client.request("https://REDACTED.com/pb/s/api/auth/login", httpMethod = httpPOST, body = $body)
echo resp.body
I'm calling an API to get some data. Running the python code I get the traceback below. However, the nim code works perfectly so there must be something wrong with the python code or setup.
I'm running Python version 2.7.15.
requests lib version 2.19.1
Traceback (most recent call last):
File "C:/Python27/testht.py", line 21, in <module>
"Referer": "https://REDACTED.com/pb/a/"
File "C:\Python27\lib\site-packages\requests\api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 511, in send
raise SSLError(e, request=request)
SSLError: HTTPSConnectionPool(host='REDACTED.com', port=443): Max retries exceeded with url: /pb/s/api/auth/login (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))

The requests module will verify the cert it gets from the server, much like a browser would. Rather than being able to click through and say "add exception" like you would in your browser, requests will raise that exception.
There's a way around it though: try adding verify=False to your post call.

However, the nim code works perfectly so there must be something wrong with the python code or setup.
Actually, your Python code or setup is less to blame but instead the nim code or better the defaults on the httpclient library. In the documentation for nim can be seen that httpclient.request uses a SSL context returned by getDefaultSSL by default which according to this code creates a context which does not verify the certificate:
proc getDefaultSSL(): SSLContext =
result = defaultSslContext
when defined(ssl):
if result == nil:
defaultSSLContext = newContext(verifyMode = CVerifyNone)
Your Python code instead attempts to properly verify the certificate since the requests library does this by default. And it fails to verify the certificate because something is wrong - either with your setup or the server.
It is unclear who has issued the certificate for your site but if it is not in your default CA store you can use the verify argument of requests to specify the issuer CA. See this documentation for details.
If the site you are trying to access works with the browser but fails with your program it might be that it uses a special CA which was added as trusted to the browser (like a company certificate). Browsers and Python use different trust stores so this added certificate needs to be added to Python or at least to your program as trusted too. It might also be that the setup of the server has problems. Browsers can sometimes work around problems like a missing intermediate certificate but Python doesn't. In case of a public accessible site you could use SSLLabs to check what's wrong.

Python Requests post times out despite timeout setting

I am using the Python Requests module (v. 2.19.1) with Python 3.4.3, calling a function on a remote server that generates a .csv file for download. In general, it works perfectly. There is one particular file that takes >6 minutes to complete, and no matter what I set the timeout parameter to, I get an error after exactly 5 minutes trying to generate that file.
import requests
s = requests.Session()
authPayload = {'UserName': 'myloginname','Password': 'password'}
loginURL = 'https://myremoteserver.com/login/authenticate'
login = s.post(loginURL, data=authPayload)
backupURL = 'https://myremoteserver.com/directory/jsp/Backup.jsp'
payload = {'command': fileCommand}
headers = {'Connection': 'keep-alive'}
post = s.post(backupURL, data=payload, headers=headers, timeout=None)
This times out after exactly 5 minutes with the error:
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 612, in urlopen
raise MaxRetryError(self, url, e)
urllib3.exceptions.MaxRetryError: > HTTPSConnectionPool(host='myremoteserver.com', port=443): Max retries exceeded with url: /directory/jsp/Backup.jsp (Caused by < class 'http.client.BadStatusLine'>: '')
If I set timeout to something much smaller, say, 5 seconds, I get a error that makes perfect sense:
urllib3.exceptions.ReadTimeoutError:
HTTPSConnectionPool(host='myremoteserver.com', port=443): Read
timed out. (read timeout=5)
If I run the process from a browser, it works fine, so it doesn't seem like it's the remote server closing the connection, or a firewall or something in-between closing the connection.

Posted at the request of the OP -- my comments on the original question pointed to a related SO problem
The clue to the problem lies in the http.client.BadStatusLine error.
Take a look at the following related SO Q & A that discusses the impact of proxy servers on HTTP requests and responses.

sending Xml Request

I've spent a total of 30 minutes in python lol, so take that into consideration when you answer lol:
I'm trying to send an HTTP POST request with a body and reading the response. I'm using Python 3.6.5 on Windows 10. This is what I have so far:
import http.client
import xml.dom.minidom
HOST = "www.mysite.com"
API_URL = "/service"
def do_request(xml_location):
request = open(xml_location, "r").read()
webservice = http.client.HTTPConnection(HOST)
webservice.request("POST", API_URL)
webservice.putheader("Host", HOST)
webservice.putheader("User-Agent", "Python Post")
webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"")
webservice.putheader("Content-length", "%d" % len(request))
webservice.endheaders()
webservice.send(request)
statuscode, statusmessage, header = webservice.getreply()
result = webservice.getfile().read()
resultxml = xml.dom.minidom.parseString(result)
print (statuscode, statusmessage, header)
print (resultxml.toprettyxml())
with open("output-%s" % xml_location, "w") as xmlfile:
xmlfile.write(resultxml.toprettyxml())
do_request("test.xml")
test.xml contains the XML request. When I run, I get an error:
Traceback (most recent call last):
File "C:\Users\xxx\Documents\test.py", line 33, in <module>
do_request("test.xml")
File "C:\Users\xxx\Documents\test.py", line 14, in do_request
webservice.putheader("Host", HOST)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\http\client.py", line 1201, in putheader
raise CannotSendHeader()
http.client.CannotSendHeader

Your problem is that you mixed up the request and putrequest methods. (Not surprisingly, given the brevity and sparsity of the documentation… most modules in Python are documented a lot better than this, so don't let that worry you about the future.)
The request method is a convenience function that adds the request line, all the headers, and the data all in one go. After you've done that, it's way too late to add a header, hence the error message.
So, you can fix it either way.
(1) Change it to use putrequest. I realize there's no example using putrequest or putheader anywhere in the docs, but it looks like this:
webservice.putrequest("POST", API_URL)
webservice.putheader("Host", HOST)
webservice.putheader("User-Agent", "Python Post")
webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"")
webservice.putheader("Content-length", "%d" % len(request))
webservice.endheaders()
webservice.send(request)
(2) Change it to use request. This is what all the examples in the docs do; you just need to build up a dict of headers to pass to it:
headers = {
"Host": HOST,
"User-Agent": "Python Post",
"Content-type", "text/xml; charset=\"UTF-8\"",
"Content-length", "%d" % len(request)
}
webservice.request("POST", API_URL, headers=headers, body=request)
(3) Read this at the top of the docs:
This module defines classes which implement the client side of the HTTP and HTTPS protocols. It is normally not used directly — the module urllib.request uses it to handle URLs that use HTTP and HTTPS.
See also The Requests package is recommended for a higher-level HTTP client interface.
For most real-life cases, you want to use requests if you can use a third-party library, and urllib.request if you can't. They're both simpler, and better documented.

How can I make sure a BS4 request is being made with a socket on a list?

I have a list of proxies like this one I would like to use in scraping with python:
proxies_ls = [ '149.56.89.166:3128',
'194.44.176.116:8080',
'14.203.99.67:8080',
'185.87.65.204:63909',
'103.206.161.234:63909',
'110.78.177.100:65103']
and made a function in order to scrap a url using bs4 and requests module called crawlSite(url). Here's the code:
# Bibliotecas para crawl e regex
from bs4 import BeautifulSoup
import requests
from fake_useragent import UserAgent
import re
#Biblioteca para data
import datetime
from time import gmtime, strftime
#Biblioteca para escrita dos logs
import os
import errno
#Biblioteca para delay aleatorio
import time
import random
print('BOT iniciado: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
proxies_ls = [ '149.56.89.166:3128',
'194.44.176.116:8080',
'14.203.99.67:8080',
'185.87.65.204:63909',
'103.206.161.234:63909',
'110.78.177.100:65103']
def crawlSite(url):
#Chrome emulation
ua=UserAgent()
header={'user-agent':ua.chrome}
random.shuffle(proxies_ls)
#Random delay
print('antes do delay: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
tempoRandom=random.randint(1,5)
time.sleep(tempoRandom)
try:
randProxy=random.choice(proxies_ls)
# Getting the webpage, creating a Response object emulated with chrome with a 30sec timeout.
response = requests.get(url,proxies = {'https':randProxy},headers=header,timeout=30)
print(response)
print('Resposta obtida: '+ datetime.datetime.now().strftime('%d-%m-%Y %H:%M:%S'))
#Avoid HTTP request errors
if response.status_code == 404:
raise ConnectionError("HTTP Response [404] - The requested resource could not be found")
elif response.status_code == 409:
raise ConnectionError("HTTP Response [409] - Possible Cloudflare DNS resolution error")
elif response.status_code == 403:
raise ConnectionError("HTTP Response [403] - Permission denied error")
elif response.status_code == 503:
raise ConnectionError("HTTP Response [503] - Service unavailable error")
print('RR Status {}'.format(response.status_code))
# Extracting the source code of the page.
data = response.text
except ConnectionError:
try:
proxies_ls.remove(randProxy)
except ValueError:
pass
randProxy=random.choice(proxies_ls)
return BeautifulSoup(data, 'lxml')
What I would like to do is to make sure only the proxies on that list are being used in the conection.
The random part
randProxy=random.choice(proxies_ls)
is working ok but the checking part if the proxy is valid or not isn't. Mainly because I still get 200 as response with a "made up proxy".
If I reduce the list to this:
proxies_ls = ['149.56.89.166:3128']
with a proxy that doesn't work I still get 200 as response! (I tried using a proxychecker like https://pt.infobyip.com/proxychecker.php and it doesn't work...)
So my questions are (I'll enumerate so it is easier):
a) Why am I getting this 200 response and not a 4xx response?
b) How can I force the request to use the proxies as I want?
Thank you,
Eunito.

Read the docs carefully, you have to specify in the dictionary the following things:
http://docs.python-requests.org/en/master/user/advanced/#proxies
What protocol to use the proxy for
What protocol the proxy uses
The address and port of the proxy
A "working" dict should look as follows:
proxies = {
'https': 'socks5://localhost:9050'
}
This will proxy ONLY and ALL https requests. This means it will NOT proxy http.
So to proxy all webtraffic, you should configure your dict like so:
proxies = {
'https': 'socks5://localhost:9050'
'http': 'socks5://localhost:9050'
}
and substitute IP-addresses where necessary, of course. See the following example for what happens otherwise:
$ python
>>> import requests
>>> proxies = {'https':'http://149.58.89.166:3128'}
>>> # Get a HTTP page (this goes around the proxy)
>>> response = requests.get("http://www.example.com/",proxies=proxies)
>>> response.status_code
200
>>> # Get a HTTPS page (so it goes through the proxy)
>>> response = requests.get("https://www.example.com/", proxies=proxies)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 485, in send
raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.example.com', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f7d1f448c10>: Failed to establish a new connection: [Errno 110] Connection timed out',)))

So basically, if I got your question right, you just want to check if the proxy is valid or not. requests has an exception handler for that, you can do something like this:
from requests.exceptions import ProxyError
try:
response = requests.get(url,proxies = {'https':randProxy},headers=header,timeout=30)
except ProxyError:
# message proxy is invalid

How to handle "413: Request Entity Too Large" in python flask server

I'm using Flask-uploads to upload files to my Flask server. The max size allowed is set by using flaskext.uploads.patch_request_class(app, 16 * 1024 * 1024).
My client application (A unit test) uses requests to post a file that is to large.
I can see that my server returnes a HTTP response with status 413: Request Entity Too Large. But the client raises an exception in the requests code
ConnectionError: HTTPConnectionPool(host='api.example.se', port=80): Max retries exceeded with url: /images (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
My guess is that the server disconnect the receving socket and sends the reponse back to the client. But when the client gets a broken sending socket, it raises an exception and skips the response.
Questions:
Are my guess about Flask-Uploads and requests correct?
Does Flask-Uploads and request handle the 413 error correct?
Should I expect that my client code gets back some html when the post are to large?
Update
Here is a simple example reproducing my problem.
server.py
from flask import Flask, request
app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024
#app.route('/post', methods=('POST',))
def view_post():
return request.data
app.run(debug=True)
client.py
from tempfile import NamedTemporaryFile
import requests
def post(size):
print "Post with size %s" % size,
f = NamedTemporaryFile(delete=False, suffix=".jpg")
for i in range(0, size):
f.write("CoDe")
f.close()
# Post
files = {'file': ("tempfile.jpg", open(f.name, 'rb'))}
r = requests.post("http://127.0.0.1:5000/post", files=files)
print "gives status code = %s" % r.status_code
post(16)
post(40845)
post(40846)
result from client
Post with size 16 gives status code = 200
Post with size 40845 gives status code = 413
Post with size 40846
Traceback (most recent call last):
File "client.py", line 18, in <module>
post(40846)
File "client.py", line 13, in post
r = requests.post("http://127.0.0.1:5000/post", files=files)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/sessions.py", line 357, in request
resp = self.send(prep, **send_kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/sessions.py", line 460, in send
r = adapter.send(request, **kwargs)
File "/opt/python_env/renter/lib/python2.7/site-packages/requests/adapters.py", line 354, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /post (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)
my versions
$ pip freeze
Flask==0.10.1
Flask-Mail==0.9.0
Flask-SQLAlchemy==1.0
Flask-Uploads==0.1.3
Jinja2==2.7.1
MarkupSafe==0.18
MySQL-python==1.2.4
Pillow==2.1.0
SQLAlchemy==0.8.2
Werkzeug==0.9.4
blinker==1.3
itsdangerous==0.23
passlib==1.6.1
python-dateutil==2.1
requests==2.0.0
simplejson==3.3.0
six==1.4.1
virtualenv==1.10.1
voluptuous==0.8.1
wsgiref==0.1.2

Flask is closing the connection, you can set an error handler for the 413 error:
#app.errorhandler(413)
def request_entity_too_large(error):
return 'File Too Large', 413
Now the client should get a 413 error, note that I didn't test this code.
Update:
I tried recreating the 413 error, and I didn't get a ConnectionError exception.
Here's a quick example:
from flask import Flask, request
app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 1024
#app.route('/post', methods=('POST',))
def view_post():
return request.data
app.run(debug=True)
After running the file, I used the terminal to test requests and sending large data:
>>> import requests
>>> r = requests.post('http://127.0.0.1:5000/post', data={'foo': 'a'})
>>> r
<Response [200]>
>>> r = requests.post('http://127.0.0.1:5000/post', data={'foo': 'a'*10000})
>>> r
<Response [413]>
>>> r.status_code
413
>>> r.content
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>413 Request Entity Too Large</title
>\n<h1>Request Entity Too Large</h1>\n<p>The data value transmitted exceeds the capacity limit.</p>\n'
As you can see, we got a response from flask 413 error and requests didn't raise an exception.
By the way I'm using:
Flask: 0.10.1
Requests: 2.0.0

RFC 2616, the specification for HTTP 1.1, says:
10.4.14 413 Request Entity Too Large
The server is refusing to process a request because the request
entity is larger than the server is willing or able to process. The
server MAY close the connection to prevent the client from continuing
the request.
If the condition is temporary, the server SHOULD include a Retry-
After header field to indicate that it is temporary and after what
time the client MAY try again.
This is what's happening here: flask is closing the connection to prevent the client from continuing the upload, which is giving you the Broken pipe error.

Based on this github issue answers (https://github.com/benoitc/gunicorn/issues/1733#issuecomment-377000612)
#app.before_request
def handle_chunking():
"""
Sets the "wsgi.input_terminated" environment flag, thus enabling
Werkzeug to pass chunked requests as streams. The gunicorn server
should set this, but it's not yet been implemented.
"""
transfer_encoding = request.headers.get("Transfer-Encoding", None)
if transfer_encoding == u"chunked":
request.environ["wsgi.input_terminated"] = True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.