Change request headers between subsequent retries - python

Consider an http request using an OAuth token. The access token needs to be included in the header as bearer. However, if the token is expired, another request needs to be made to refresh the token and then try again. So the custom Retry object will look like:
s = requests.Session()
### token is added to the header here
s.headers.update(token_header)
retry = OAuthRetry(
total=2,
read=2,
connect=2,
backoff_factor=1,
status_forcelist=[401],
method_whitelist=frozenset(['GET', 'POST']),
session=s
)
adapter = HTTPAdapter(max_retries=retry)
s.mount('http://', adapter)
s.mount('https://', adapter)
r = s.post(url, data=data)
The Retry class:
class OAuthRetry(Retry):
def increment(self, method, url, *args, **kwargs):
# refresh the token here. This could be by getting a reference to the session or any other way.
return super(OAuthRetry, self).increment(method, url, *args, **kwargs)
The problem is that after the token is refreshed, HTTPConnectionPool is still using the same headers to make the request after calling increment. See: https://github.com/urllib3/urllib3/blob/master/src/urllib3/connectionpool.py#L787.
Although the instance of the pool is passed in increment, changing the headers there will not affect the call since it is using a local copy of the headers.
This seems like a use case that should come up frequently for the request parameters to change in between retries.
Is there a way to change the request headers in between two subsequent retries?

No, in current version of Requests(2.18.4) and urllib3(1.22).
Retrys is finally handled by openurl in urllib3. And by trace the code of the whole function, there is not a interface to change headers between retrys.
And dynamically changing headers should not be considered as a solution. From the doc:
headers – Dictionary of custom headers to send, such as User-Agent, If-None-Match, etc. If None, pool headers are used. If provided, these headers completely replace any pool-specific headers.
headers is a param passed to the function. And there is no guarantee that it will not be copy after passed. Although in current version of urllib3, openurl does not copy headers, any solution based on changing headers is considered hacky, since it's based on the implementation but not the documentation.
One work around
Interrupt a function and edit some verible it's using is very dangerous.
Instead of injecting something into urllib3, one simple solution is that check the response status and try again if needed.
r = s.post(url, data=data)
if r.status_code == 401:
# refresh the token here.
r = s.post(url, data=data)
Why does the original approach not work?
Requests copy the header in prepare_headers before sending it to urllib3. So urllib3 use the copy created before editing when retrying.

Related

Python requests - cannot understand how the argument is passed

I am using this code to get data from Twitter API.
The code works, but I cannot understand how.
Specifically, I cannot understand how the auth=bearer_oauth argument works, since I am passing a function. And how the function works, since I am calling it without its argument.
Sorry if this is too basic, but I could not find an answer.
import requests
bearer_token = "AAA"
api_url = "https://api.twitter.com/2/tweets/search/recent"
def bearer_oauth(r):
r.headers["Authorization"] = f"Bearer {bearer_token}"
return r
def connect_to_endpoint(url, params):
response = requests.get(url, auth=bearer_oauth)
return response
query_params = {'query': 'test'}
json_response = connect_to_endpoint(api_url, query_params)
The bearer_oauth function is just setting the request's authorization header to the bearer token before the request is sent.
The code you provided essentially has the same functionality as this:
headers = {"Authorization": f"Bearer {bearer_token}"
requests.get(url, headers=headers)
After you send the request, Twitter's server parses the authorization header and checks that the bearer token you supplied is valid and has access to the requested resources.
As for why your specific code works, bearer_oauth is an authentication handler that gets attached to the request. The handler gets called when the request is constructed. You don't need to pass the request object because the handler is part of it already.
If you're curious about the implementation, I'd read the internal code here. It looks like the request object is passed to the handler, which then modifies the request (in this case, by setting the authorization header), and then returns the modified request object back to the internal function preparing the request. Then, all of the modified request object's attributes are copied:
# Allow auth to make its changes.
r = auth(self)
# Update self to reflect the auth changes.
self.__dict__.update(r.__dict__)
Since __dict__ is an internal dictionary that holds all the attributes of a single object, everything that was changed about the request object in the handler function will be copied and included in the request before it is sent.

How to print text of POST request without making request

If I make the request
api-key = 'asdfklhsdfkjahsdlgkjahlkdjahfsa'
url = 'https://www.website.com'
headers = {'api-key': api-key,
'Content-Type': 'application/json'}
request_data = {'foo': 'bar', 'egg': 'spam'}
result = requests.post(url, headers=headers, data=request_data)
The server is contacted. Suppose that instead I want to do something like
request_string = requests.foobar(url, headers=headers, data=request_data)
import os
os.system('curl ' + request_string)
So that I can look to see what the request is doing without bothering the server (possibly to the point that I could c&p it into curl), what would foobar be? Or in general, what is a way to inspect the contents of the request without making it?
Here's another post that implies that you can use Request().prepare() to observe the request without actually sending the request.
Furthermore the official documentation reads "In some cases you may wish to do some extra work to the body or headers (or anything else really) before sending a request. The simple recipe for this is the following" and then it illustrates Request.prepare()

request.Request to delete a gitlab branch does not work but works using curl DELETE

I am trying to delete a git branch from gitlab, using the gitlab API with a personal access token.
If I use curl like this:
curl --request DELETE --header "PRIVATE_TOKEN: somesecrettoken" "deleteurl"
then it works and the branch is deleted.
But if I use requests like this:
token_data = {'private_token': "somesecrettoken"}
requests.Request("DELETE", url, data= token_data)
it doesn't work; the branch is not deleted.
Your requests code is indeed not doing the same thing. You are setting data=token_data, which puts the token in the request body. The curl command-line uses a HTTP header instead, and leaves the body empty.
Do the same in Python:
token_data = {'Private-Token': "somesecrettoken"}
requests.Request("DELETE", url, headers=token_data)
You can also put the token in the URL parameters, via the params argument:
token_data = {'private_token': "somesecrettoken"}
requests.Request("DELETE", url, params=token_data)
This adds ?private_token=somesecrettoken to the URL sent to gitlab.
However, GitLab does accept the private_token value in the request body as well, either as form data or as JSON. Which means that you are using the requests API wrong.
A requests.Request() instance is not going to be sent without additional work. It is normally only needed if you want to access the prepared data before sending.
If you don't need to use this more advanced feature, use the requests.delete() method:
response = requests.delete(url, headers=token_data)
If you do need the feature, use a requests.Session() object, then first prepare the request object, then send it:
with requests.Session() as session:
request = requests.Request("DELETE", url, params=token_data)
prepped = request.prepare()
response = session.send(prepped)
Even without needing to use prepared requests, a session is very helpful when using an API. You can set the token once, on a session:
with requests.Session() as session:
session.headers['Private-Token'] = 'somesecrettoken'
# now all requests via the session will use this header
response = session.get(url1)
response = session.post(url2, json=....)
response = session.delete(url3)
# etc.

Python POST request does not take form data with no files

Before downvoting/marking as duplicate, please note:
I have already tried out this, this, this, this,this, this - basically almost all the methods I could find pointed out by the Requests documentation but do not seem to find any solution.
Problem:
I want to make a POST request with a set of headers and form data.
There are no files to be uploaded. As per the request body in Postman, we set the parameters by selecting 'form-data' under the 'Body' section for the request.
Here is the code I have:
headers = {'authorization': token_string,
'content-type':'multipart/form-data; boundary=----WebKitFormBoundaryxxxxxXXXXX12345'} # I get 'unsupported application/x-www-form-url-encoded' error if I remove this line
body = {
'foo1':'bar1',
'foo2':'bar2',
#... and other form data, NO FILE UPLOADED
}
#I have also tried the below approach
payload = dict()
payload['foo1']='bar1'
payload['foo2']='bar2'
page = ''
page = requests.post(url, proxies=proxies, headers=headers,
json=body, files=json.dump(body)) # also tried data=body,data=payload,files={} when giving data values
Error
{"errorCode":404,"message":"Required String parameter 'foo1' is not
present"}
EDIT:
Adding a trace of the network console. I am defining it in the same way in the payload as mentioned on the request payload.
There isn't any gui at all? You could get the network data from chrome, although:
Try this:
headers = {'authorization': token_string}
Probably there is more authorization? Or smthng else?
You shouldn't add Content-Type as requests will handle it for you.
Important, you could see the content type as WebKitFormBoundary, so for the payload you must take, the data from the "name" variable.
Example:
(I know you won't upload any file, it just an example) -
So in this case, for my payload would look like this: payload = {'photo':'myphoto'} (yea there would be an open file etc etc, but I try to keep it simple)
So your payload would be this-> (So always use name from the WebKit)
payload = {'foo1':'foo1data',
'foo2':'foo2data'}
session.post(url,data = payload, proxies etc...)
Important! As I can see you use the method from requests library. Firstly you always should create a session like this
session = requests.session() -> it will handle cookies, headers, etc, and won't open a new session, or plain requests with every requests.get/post.

Passing session from template view to python requests api call

I want to make multiple internal REST API call from my Django TemplateView, using requests library. Now I want to pass the session too from template view to api call. What is the recommended way to do that, keeping performance in mind.
Right now, I'm extracting cookie from the current request object in template view, and passing that to requests.get() or requests.post() call. But problem with that is, I would have to pass request object to my API Client, which I don't want.
This the current wrapper I'm using to route my requests:
def wrap_internal_api_call(request, requests_api, uri, data=None, params=None, cookies=None, is_json=False, files=None):
headers = {'referer': request.META.get('HTTP_REFERER')}
logger.debug('Request API: %s calling URL: %s', requests_api, uri)
logger.debug('Referer header sent with requests: %s', headers['referer'])
if cookies:
csrf_token = cookies.get('csrftoken', None)
else:
csrf_token = request.COOKIES.get('csrftoken', None)
if csrf_token:
headers['X-CSRFToken'] = csrf_token
if data:
if is_json:
return requests_api(uri, json=data, params=params, cookies=cookies if cookies else request.COOKIES, headers=headers)
elif not files:
return requests_api(uri, data=data, params=params, cookies=cookies if cookies else request.COOKIES, headers=headers)
else:
return requests_api(uri, data=data, files=files, params=params, cookies=cookies if cookies else request.COOKIES,
headers=headers)
else:
return requests_api(uri, params=params, cookies=cookies if cookies else request.COOKIES, headers=headers)
Basically I want to get rid of that request parameter (1st param), because then to call it I've to keep passing request object from TemplateViews to internal services. Also, how can I keep persistent connection across multiple calls?
REST vs Invoking the view directly
While it's possible for a web app to make a REST API call to itself. That's not what REST is designed for. Consider the following from: https://docs.djangoproject.com/ja/1.9/topics/http/middleware/
As you can see a django request/response cycle has quite a bit of overhead. Add to this the overhead of webserver and wsgi container. At the client side you have the overhead associated with the requests library, but hang on a sec, the client also happens to be the same web app so it become s part of the web app's overhead too. And there is the problem of peristence (which I will come to shortly).
Last but not least, if you have a DNS round robin setup your request may actually go out on the wire before coming back to the same server. There is a better way, to invoke the view directly.
To invoke another view without the rest API call is really easy
other_app.other_view(request, **kwargs)
This has been discussed a few times here at links such as Django Call Class based view from another class based view and Can I call a view from within another view? so I will not elaborate.
Persistent requests
Persistent http requests (talking about python requests rather than django.http.request.HttpRequest) are managed through session objects (again not to be confused with django sessions). Avoiding confusion is really difficult:
The Session object allows you to persist certain parameters across
requests. It also persists cookies across all requests made from the
Session instance, and will use urllib3's connection pooling. So if
you're making several requests to the same host, the underlying TCP
connection will be reused, which can result in a significant
performance increase
Different hits to your django view will probably be from different users so you don't want to same cookie reused for the internal REST call. The other problem is that the python session object cannot be persisted between two different hit to the django view. Sockets cannot generally be serialized, a requirement for chucking them into memcached or redis.
If you still want to persist with internal REST
I think #julian 's answer shows how to avoid passing the django request instance as a parameter.
If you want to avoid passing the request to wrap_internal_api_call, all you need to do is do a bit more work on the end of the TemplateView where you call the api wrapper. Note that your original wrapper is doing a lot of cookies if cookies else request.COOKIES. You can factor that out to the calling site. Rewrite your api wrapper as follows:
def wrap_internal_api_call(referer, requests_api, uri, data=None, params=None, cookies, is_json=False, files=None):
headers = {'referer': referer}
logger.debug('Request API: %s calling URL: %s', requests_api, uri)
logger.debug('Referer header sent with requests: %s', referer)
csrf_token = cookies.get('csrftoken', None)
if csrf_token:
headers['X-CSRFToken'] = csrf_token
if data:
if is_json:
return requests_api(uri, json=data, params=params, cookies=cookies, headers=headers)
elif not files:
return requests_api(uri, data=data, params=params, cookies=cookies, headers=headers)
else:
return requests_api(uri, data=data, files=files, params=params, cookies=cookies, headers=headers)
else:
return requests_api(uri, params=params, cookies=cookies, headers=headers)
Now, at the place of invocation, instead of
wrap_internal_api_call(request, requests_api, uri, data, params, cookies, is_json, files)
do:
cookies_param = cookies or request.COOKIES
referer_param = request.META.get['HTTP_REFERER']
wrap_internal_api_call(referer_param, requests_api, uri, data, params, cookies_param, is_json, files)
Now you are not passing the request object to the wrapper anymore. This saves a little bit of time because you don't test cookies over and over, but otherwise it doesn't make a difference for performance. In fact, you could achieve the same slight performance gain just by doing the cookies or request.COOKIES once inside the api wrapper.
Networking is always the tightest bottleneck in any application. So if these internal APIs are on the same machine as your TemplateView, your best bet for performance is to avoid doing an API call.
Basically I want to get rid of that request parameter (1st param), because then to call it I've to keep passing request object from TemplateViews to internal services.
To pass function args without explicitly passing them into function calls you can use decorators to wrap your functions and automatically inject your arguments. Using this with a global variable and some django middleware for registering the request before it gets to your view will solve your problem. See below for an abstracted and simplified version of what I mean.
request_decorators.py
REQUEST = None
def request_extractor(func):
def extractor(cls, request, *args, **kwargs):
global REQUEST
REQUEST = request # this part registers request arg to global
return func(cls, request, *args, **kwargs)
return extractor
def request_injector(func):
def injector(*args, **kwargs):
global REQUEST
request = REQUEST
if len(args) > 0 and callable(args[0]): # to make it work with class methods
return func(args[0], request, args[1:], **kwargs) # class method
return func(request, *args, **kwargs) # function
return injector
extract_request_middleware.py
See the django docs for info on setting up middleware
from request_decorators import request_extractor
class ExtractRequest:
#request_extractor
def process_request(self, request):
return None
internal_function.py
from request_decorators import request_injector
#request_injector
def internal_function(request):
return request
your_view.py
from internal_function import internal_function
def view_with_request(request):
return internal_function() # here we don't need to pass in the request arg.
def run_test():
request = "a request!"
ExtractRequest().process_request(request)
response = view_with_request(request)
return response
if __name__ == '__main__':
assert run_test() == "a request!"

Categories