Retry for python requests module hanging - python

I had to use the request module to fetch a huge number of URLs. Due to network errors, I wanted to implement the Retry mechanism. So my code looks like this:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import requests
session = requests.Session()
retry = Retry(total=1,backoff_factor=0.5,status_forcelist=(500,502,504,503))
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://',adapter)
head_response = session.get(url,timeout=2)
However, the code keeps hanging when the URL is: tiffins.NET. A normal requests.get with timeout=2 gives 503 status code but doesn't hang.
Am I doing something wrong?
I'm using python 2.7.15rc1.

Just had the same problem and figured out that it's because of the website's header "Retry-After". By default, Retry uses that value to wait until the next request. To ignore it, set the respect_retry_after_header parameter to False when creating a Retry object.

Related

Getting a 401 response while using Requests package

I am trying to access a server over my internal network under https://prodserver.de/info.
I have the code structure as below:
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password))
print(resp.status_code)
While trying to access this server via browser, it works perfectly fine.
What am I doing wrong?
By default, requests library verifies the SSL certificate for HTTPS requests. If the certificate is not verified, it will raise a SSLError. You check this by disabling the certificate verification by passing verify=False as an argument to the get method, if this is the issue.
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password), verify=False)
print(resp.status_code)
try using requests' generic auth, like this:
resp = requests.get('https://prodserver.de/info/', auth=(username,password)
What am I doing wrong?
I can not be sure without investigating your server, but I suggest checking if assumption (you have made) that server is using Basic authorization, there exist various Authentication schemes, it is also possible that your server use cookie-based solution, rather than headers-based one.
While trying to access this server via browser, it works perfectly
fine.
You might then use developer tools to see what is actually send inside and with request which does result in success.

How to handle loss of connection in python?

Im working with requesting some data from API server.
res=requests.get(path=HTTPBasicAuth(user,password)
im trying to test different cases.
Permanent connection loss > was solved with ConnectionError exception.
Sudden loss : if i lost connection in the middle of requests
How would i handle the second case? the try-except is not working.
Try to use from requests.adapters import HTTPAdapter, Retry and then assign the adapter to a Session() like this
from requests import Session
from requests.adapters import HTTPAdapter, Retry
session = Session()
adapter = HTTPAdapter(max_retries=Retry(total=10, backoff_factor=0.5))
session.mount('https://', adapter)
session.mount('http://', adapter)
session.get(...)

Requests Retry: will same proxy be used

I keep seeing examples similar to this for adding retries when using the Requests library. However, I'm unsure of whether Requests will execute choice(my_proxy_list) at every retry and thus get a new proxy or just keep retrying with the same arguments used for initial request.
import my_proxy_list
from random import choice
import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.4, status_forcelist=[400, 429, 500, 502, 503, 504])
session.mount("http://", HTTPAdapter(max_retries=retries))
response = session.get(
url=url,
proxies=choice(my_proxy_list),
timeout=(10, 27),
)
session.close()
Function arguments are evaluated once, regardless of what that function goes on to do later (including any retry logic that function may use internally), so random.choice will only be called once in your example.
The best option if you want a (chance at a) different proxy each time is to do your own retry logic which calls random.choice each time. To guarantee a different proxy each time, you could shuffle the list of possible proxies at the start and then traverse it.
Alternatively, it would be possible to pass in a dictionary-like object for proxies with a __getitem__ designed to return a random proxy each time, but that approach isn't recommended as it'd be very fragile and would rely heavily on the implementation details of session.get.

Python SOAP client with Zeep - import namespace

A little context: I am opening this question arose here, after solving an authentication problem. I prefer to open a new one to avoid polluting the previous with comments not related to the original issue, and to give it the proper visibility.
I am working on a SOAP client running in the same intranet as the server, without internet access.
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport
wsdl = 'http://mysite.dom/services/MyWebServices?WSDL'
client = Client(wsdl, transport=HTTPBasicAuth('user','pass'), cache=None)
The problem: WSDL contains an import to an external resource located outside the intranet ('import namespace="schemas.xmlsoap.org/soap/encoding/"') and therefore Zeep Client instantiation fails with:
Exception: HTTPConnectionPool(host='schemas.xmlsoap.org', port=80): Max retries exceeded with url: /soap/encoding/ (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3dab9d30b8>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Question: is it possible (and does it make sense) to create the Zeep Client without accessing the external resource?
As an additional detail, another client written in Java, based on XML rpc ServiceFactory seems to be more resilient to this kind of problem, the service is created (and works) even if no internet connection is available.
Is it really needed to import the namespace from xmlsoap.org?
Edit, after answer from #mvt:
So, I went for the proposed solution, which allows me at the same time to control the access to external resources (read: forbid access to servers different from the one hosting the endpoint).
class MyTransport(zeep.Transport):
def load(self, url):
if not url:
raise ValueError("No url given to load")
parsed_url = urlparse(url)
if parsed_url.scheme in ('http', 'https'):
if parsed_url.netloc == "myserver.ext":
response = self.session.get(url, timeout=self.load_timeout)
response.raise_for_status()
return response.content
elif url == "http://schemas.xmlsoap.org/soap/encoding/":
url = "/some/path/myfile.xsd"
else:
raise
elif parsed_url.scheme == 'file':
if url.startswith('file://'):
url = url[7:]
with open(os.path.expanduser(url), 'rb') as fh:
return fh.read()
I would suggest doing your custom overriding of the URL and calling load() from the super class. This way if the super class code changes (which it has), you would not need to refactor your CustomTransport class.
from zeep.transports import Transport
class CustomTransport(Transport):
def load(self, url):
# Custom URL overriding to local file storage
if url and url == "http://schemas.xmlsoap.org/soap/encoding/":
url = "/path/to/schemas.xmlsoap.org.xsd"
# Call zeep.transports.Transport's load()
return super(CustomTransport, self).load(url)
The way to use the Transports in zeep is described here, but here is a quick example of using the CustomTransport:
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
session = Session()
client = Client('http://example.com/production.svc?wsdl', transport=CustomTransport(session=session))
client.service.foo()
You could create your own subclass of the tranport class and add additional logic to the load() method so that specific url's are redirected / loaded from the filesystem.
The code is pretty easy i think: https://github.com/mvantellingen/python-zeep/blob/master/src/zeep/transports.py :-)

Re-use http connection Python 3

So every second I am making a bunch of requests to website X every second, as of now with the standard urllib packages like so (the requestreturns a json):
import urllib.request
import threading, time
def makerequests():
request = urllib.request.Request('http://www.X.com/Y')
while True:
time.sleep(0.2)
response = urllib.request.urlopen(request)
data = json.loads(response.read().decode('utf-8'))
for i in range(4):
t = threading.Thread(target=makerequests)
t.start()
However because I'm making so much requests after about 500 requests the website returns HTTPError 429: Too manyrequests. I was thinking it might help if I re-use the initial TCP connection, however I noticed it was not possible to do this with the urllib packages.
So I did some googling and discovered that the following packages might help:
Requests
http.client
socket ?
So I have a question: which one is best suited for my situation and can someone show an example of either one of them (for Python 3)?
requests handles keep alive automatically if you use a session. This might not actually help you if the server is rate limiting requests, however, requests also handles parsing JSON so that's a good reason to use it. Here's an example:
import requests
s = requests.Session()
while True:
time.sleep(0.2)
response = s.get('http://www.X.com/y')
data = response.json()

Categories