I am trying to do one REST API call for POST method.
There will be around 500 plus calls to the same endpoint from different hosts.
It is resulting in 503 error, Hence I tried to achieve the retry mechanism using Retry function of requests module.
Even though after implementing the retry function still am getting same error.
Code snippet is as below:
import requests
requests.packages.urllib3.disable_warnings()
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
import json
s = requests.Session()
retries = Retry(total=3, backoff_factor=0.3, status_forcelist=[502, 503, 504],
method_whitelist=frozenset(['GET', 'POST']))
s.mount('https//', HTTPAdapter(max_retries=retries))
s.mount('http//', HTTPAdapter(max_retries=retries))
response = s.post(url, json=json_payload, headers=headers)
Still am getting an error with 503.
Already gone through previous answers for same pattern of question, But i don't find more information regarding this.
Note: Here script is executed at same time from 525 hosts so around 500 plus hosts will try to trigger the call to the url(internal url), so here 350 hosts i am getting proper 200 response, other end up in 503 error.
Correct me if I am missing something here, Any help is appreciated.
It was basically typos in code, which were causing retry failure mechanism which were not working well.
s.mount('https://', HTTPAdapter(max_retries=retries))
s.mount('http://', HTTPAdapter(max_retries=retries))
Missed to add ":" in "http"/"https". after adding ":" it worked well
Related
I have some troubles getting the picture on my ip camera on python. I have an axis camera, I almost do the work on the rtsp link and cv2 video capture but when the hours go by I got an h264 error (here I asked for that problem).
So I decided to use a get request to get the picture, but now I got 401, error. Here is my code:
import requests
from requests.auth import HTTPBasicAuth
r = requests.get("http://xxx.xxx.xxx.xxx/jpg/image.jpg", auth=HTTPBasicAuth('xxx', 'xxx'))
print(r.status_code)
I also tried with out the HTTPBasicAuth but the same, I don't know how to get a good auth here.
Any help?
There is nothing wrong with your code. I have done the same code and works fine on my side. I would suggest you to verify the credentials that you have provided as a 401 response code is received when you provide wrong password or username.
Additionally, don't forget to pass the stream=True parameter inside the requests.get parameter otherwise the process will never successfully return anything even if the credentials actually work.
import requests
from requests.auth import HTTPBasicAuth
r = requests.get("http://xxx.xxx.xxx.xxx/jpg/image.jpg", auth=HTTPBasicAuth('xxx', 'xxx'), stream=True)
for streamDataChunks in r:
process_raw_image_data(streamDataChunks)
Consider an http request using an OAuth token. The access token needs to be included in the header as bearer. However, if the token is expired, another request needs to be made to refresh the token and then try again. So the custom Retry object will look like:
s = requests.Session()
### token is added to the header here
s.headers.update(token_header)
retry = OAuthRetry(
total=2,
read=2,
connect=2,
backoff_factor=1,
status_forcelist=[401],
method_whitelist=frozenset(['GET', 'POST']),
session=s
)
adapter = HTTPAdapter(max_retries=retry)
s.mount('http://', adapter)
s.mount('https://', adapter)
r = s.post(url, data=data)
The Retry class:
class OAuthRetry(Retry):
def increment(self, method, url, *args, **kwargs):
# refresh the token here. This could be by getting a reference to the session or any other way.
return super(OAuthRetry, self).increment(method, url, *args, **kwargs)
The problem is that after the token is refreshed, HTTPConnectionPool is still using the same headers to make the request after calling increment. See: https://github.com/urllib3/urllib3/blob/master/src/urllib3/connectionpool.py#L787.
Although the instance of the pool is passed in increment, changing the headers there will not affect the call since it is using a local copy of the headers.
This seems like a use case that should come up frequently for the request parameters to change in between retries.
Is there a way to change the request headers in between two subsequent retries?
No, in current version of Requests(2.18.4) and urllib3(1.22).
Retrys is finally handled by openurl in urllib3. And by trace the code of the whole function, there is not a interface to change headers between retrys.
And dynamically changing headers should not be considered as a solution. From the doc:
headers – Dictionary of custom headers to send, such as User-Agent, If-None-Match, etc. If None, pool headers are used. If provided, these headers completely replace any pool-specific headers.
headers is a param passed to the function. And there is no guarantee that it will not be copy after passed. Although in current version of urllib3, openurl does not copy headers, any solution based on changing headers is considered hacky, since it's based on the implementation but not the documentation.
One work around
Interrupt a function and edit some verible it's using is very dangerous.
Instead of injecting something into urllib3, one simple solution is that check the response status and try again if needed.
r = s.post(url, data=data)
if r.status_code == 401:
# refresh the token here.
r = s.post(url, data=data)
Why does the original approach not work?
Requests copy the header in prepare_headers before sending it to urllib3. So urllib3 use the copy created before editing when retrying.
I'm using Python 3.7 with urllib.
All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).
This is the error i get:
ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect
I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.
These is the piece of code i use to perform the request:
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
self.logger.debug(req.headers)
self.logger.info(req.data)
resp = urllib.request.urlopen(req)
url is an https resource and i set an header with some Authhorization info and content-type.
req.data is a JSON
From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL.
I've also tried to use an opener specifiyng the default redirect handler, but with the same result
opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
resp = opener.open(req)
What could be the problem?
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
How to handle 307 redirection using urllib2 from http to https
HTTP Error 307: Temporary Redirect in Python3 - INTRANET
HTTP Error 307 - Temporary redirect in python script
I am using the Python Requests module (v. 2.19.1) with Python 3.4.3, calling a function on a remote server that generates a .csv file for download. In general, it works perfectly. There is one particular file that takes >6 minutes to complete, and no matter what I set the timeout parameter to, I get an error after exactly 5 minutes trying to generate that file.
import requests
s = requests.Session()
authPayload = {'UserName': 'myloginname','Password': 'password'}
loginURL = 'https://myremoteserver.com/login/authenticate'
login = s.post(loginURL, data=authPayload)
backupURL = 'https://myremoteserver.com/directory/jsp/Backup.jsp'
payload = {'command': fileCommand}
headers = {'Connection': 'keep-alive'}
post = s.post(backupURL, data=payload, headers=headers, timeout=None)
This times out after exactly 5 minutes with the error:
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 612, in urlopen
raise MaxRetryError(self, url, e)
urllib3.exceptions.MaxRetryError: > HTTPSConnectionPool(host='myremoteserver.com', port=443): Max retries exceeded with url: /directory/jsp/Backup.jsp (Caused by < class 'http.client.BadStatusLine'>: '')
If I set timeout to something much smaller, say, 5 seconds, I get a error that makes perfect sense:
urllib3.exceptions.ReadTimeoutError:
HTTPSConnectionPool(host='myremoteserver.com', port=443): Read
timed out. (read timeout=5)
If I run the process from a browser, it works fine, so it doesn't seem like it's the remote server closing the connection, or a firewall or something in-between closing the connection.
Posted at the request of the OP -- my comments on the original question pointed to a related SO problem
The clue to the problem lies in the http.client.BadStatusLine error.
Take a look at the following related SO Q & A that discusses the impact of proxy servers on HTTP requests and responses.
Scope:
I am currently trying to write a Web scraper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.
Problem:
When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.
Code Sample:
I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.
import httplib
# Wrapper for a "HTTP GET" Request
class HttpClient(object):
def HttpGet(self, url, host):
connection = httplib.HTTPConnection(host)
connection.request('GET', url)
return connection.getresponse().read()
# Using "HttpClient" class
httpclient = httpClient()
# This is the full URL I need to make a get request for : https://420101.com/strain-database
httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText
I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.
The problem i've had too little or too much caffeine in my system.
To get a https, I needed the HTTPSConnection class.
Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.
Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.
My Validation:
c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason
200 OK