I keep seeing examples similar to this for adding retries when using the Requests library. However, I'm unsure of whether Requests will execute choice(my_proxy_list) at every retry and thus get a new proxy or just keep retrying with the same arguments used for initial request.
import my_proxy_list
from random import choice
import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.4, status_forcelist=[400, 429, 500, 502, 503, 504])
session.mount("http://", HTTPAdapter(max_retries=retries))
response = session.get(
url=url,
proxies=choice(my_proxy_list),
timeout=(10, 27),
)
session.close()
Function arguments are evaluated once, regardless of what that function goes on to do later (including any retry logic that function may use internally), so random.choice will only be called once in your example.
The best option if you want a (chance at a) different proxy each time is to do your own retry logic which calls random.choice each time. To guarantee a different proxy each time, you could shuffle the list of possible proxies at the start and then traverse it.
Alternatively, it would be possible to pass in a dictionary-like object for proxies with a __getitem__ designed to return a random proxy each time, but that approach isn't recommended as it'd be very fragile and would rely heavily on the implementation details of session.get.
Related
I had to use the request module to fetch a huge number of URLs. Due to network errors, I wanted to implement the Retry mechanism. So my code looks like this:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import requests
session = requests.Session()
retry = Retry(total=1,backoff_factor=0.5,status_forcelist=(500,502,504,503))
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://',adapter)
head_response = session.get(url,timeout=2)
However, the code keeps hanging when the URL is: tiffins.NET. A normal requests.get with timeout=2 gives 503 status code but doesn't hang.
Am I doing something wrong?
I'm using python 2.7.15rc1.
Just had the same problem and figured out that it's because of the website's header "Retry-After". By default, Retry uses that value to wait until the next request. To ignore it, set the respect_retry_after_header parameter to False when creating a Retry object.
I have written a program by Python. I am using the requests library and want to set the Session.
I have tried to create a function for the network request,the arguments is url(str) and session object. (I have used the session for request a website)
The problem is: The session is not fixed (just my opinion). When I put the session argument to the function, the request will return the wrong page, but when I haven't use the function (I have use the threads with the function), I request another website with the session after the first request, the returned page is right. So I think the problem is the Session object is not the same in function's local Session and global Session, I want to check the sessions if same, how to do it?
My Mother tongue is not English, so the English is not good,please excuse.
code:
import requests
from tomorrow import threads
# create a session
s = requests.Session()
# the right code
# s.post(url1)
# s.get(url2)
# the wrong code
#threads(100) # create threads
def request(url, session):
return session.get(url)
s.get(url1)
req = request(url2, s) # that will return a wrong page
A little context: I am opening this question arose here, after solving an authentication problem. I prefer to open a new one to avoid polluting the previous with comments not related to the original issue, and to give it the proper visibility.
I am working on a SOAP client running in the same intranet as the server, without internet access.
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport
wsdl = 'http://mysite.dom/services/MyWebServices?WSDL'
client = Client(wsdl, transport=HTTPBasicAuth('user','pass'), cache=None)
The problem: WSDL contains an import to an external resource located outside the intranet ('import namespace="schemas.xmlsoap.org/soap/encoding/"') and therefore Zeep Client instantiation fails with:
Exception: HTTPConnectionPool(host='schemas.xmlsoap.org', port=80): Max retries exceeded with url: /soap/encoding/ (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3dab9d30b8>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Question: is it possible (and does it make sense) to create the Zeep Client without accessing the external resource?
As an additional detail, another client written in Java, based on XML rpc ServiceFactory seems to be more resilient to this kind of problem, the service is created (and works) even if no internet connection is available.
Is it really needed to import the namespace from xmlsoap.org?
Edit, after answer from #mvt:
So, I went for the proposed solution, which allows me at the same time to control the access to external resources (read: forbid access to servers different from the one hosting the endpoint).
class MyTransport(zeep.Transport):
def load(self, url):
if not url:
raise ValueError("No url given to load")
parsed_url = urlparse(url)
if parsed_url.scheme in ('http', 'https'):
if parsed_url.netloc == "myserver.ext":
response = self.session.get(url, timeout=self.load_timeout)
response.raise_for_status()
return response.content
elif url == "http://schemas.xmlsoap.org/soap/encoding/":
url = "/some/path/myfile.xsd"
else:
raise
elif parsed_url.scheme == 'file':
if url.startswith('file://'):
url = url[7:]
with open(os.path.expanduser(url), 'rb') as fh:
return fh.read()
I would suggest doing your custom overriding of the URL and calling load() from the super class. This way if the super class code changes (which it has), you would not need to refactor your CustomTransport class.
from zeep.transports import Transport
class CustomTransport(Transport):
def load(self, url):
# Custom URL overriding to local file storage
if url and url == "http://schemas.xmlsoap.org/soap/encoding/":
url = "/path/to/schemas.xmlsoap.org.xsd"
# Call zeep.transports.Transport's load()
return super(CustomTransport, self).load(url)
The way to use the Transports in zeep is described here, but here is a quick example of using the CustomTransport:
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
session = Session()
client = Client('http://example.com/production.svc?wsdl', transport=CustomTransport(session=session))
client.service.foo()
You could create your own subclass of the tranport class and add additional logic to the load() method so that specific url's are redirected / loaded from the filesystem.
The code is pretty easy i think: https://github.com/mvantellingen/python-zeep/blob/master/src/zeep/transports.py :-)
So every second I am making a bunch of requests to website X every second, as of now with the standard urllib packages like so (the requestreturns a json):
import urllib.request
import threading, time
def makerequests():
request = urllib.request.Request('http://www.X.com/Y')
while True:
time.sleep(0.2)
response = urllib.request.urlopen(request)
data = json.loads(response.read().decode('utf-8'))
for i in range(4):
t = threading.Thread(target=makerequests)
t.start()
However because I'm making so much requests after about 500 requests the website returns HTTPError 429: Too manyrequests. I was thinking it might help if I re-use the initial TCP connection, however I noticed it was not possible to do this with the urllib packages.
So I did some googling and discovered that the following packages might help:
Requests
http.client
socket ?
So I have a question: which one is best suited for my situation and can someone show an example of either one of them (for Python 3)?
requests handles keep alive automatically if you use a session. This might not actually help you if the server is rate limiting requests, however, requests also handles parsing JSON so that's a good reason to use it. Here's an example:
import requests
s = requests.Session()
while True:
time.sleep(0.2)
response = s.get('http://www.X.com/y')
data = response.json()
I am using the suds package to make API request from one website. I wrote a function which opens up the client to the website and make request.
I am wondering should I or how can I terminate the connection in the end of the function?
I am wondering will the client be something like the MySQLDb.connect that actually opens up many many separate API connections that never close every time I call this function.
from suds.client import Client
import sys, re
def querysearch(reqPartNumber, reqMfg, lock):
try:
client = Client('http://app....')
userInfo = {'id':.., 'password':...}
apiResponse = client.service.getParts(...)
...
print apiResponse
except:
...
SOAP is still an HTTP request, which is stateless. Each request will start a whole new connection, re-auth, etc. Browsers kind of short circuit that with cookies, but SOAP doesn't. So, you don't need to close the connection, it's already closed by the time suds returns your data back to you.
Additionally, looking at the latest source, Client() doesn't define a close or __exit__ method, so there's nothing you really have to do here.