Python requests - print entire http request (raw)? - python

While using the requests module, is there any way to print the raw HTTP request?
I don't want just the headers, I want the request line, headers, and content printout. Is it possible to see what ultimately is constructed from HTTP request?

Since v1.2.3 Requests added the PreparedRequest object. As per the documentation "it contains the exact bytes that will be sent to the server".
One can use this to pretty print a request, like so:
import requests
req = requests.Request('POST','http://stackoverflow.com',headers={'X-Custom':'Test'},data='a=1&b=2')
prepared = req.prepare()
def pretty_print_POST(req):
"""
At this point it is completely built and ready
to be fired; it is "prepared".
However pay attention at the formatting used in
this function because it is programmed to be pretty
printed and may differ from the actual request.
"""
print('{}\n{}\r\n{}\r\n\r\n{}'.format(
'-----------START-----------',
req.method + ' ' + req.url,
'\r\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
req.body,
))
pretty_print_POST(prepared)
which produces:
-----------START-----------
POST http://stackoverflow.com/
Content-Length: 7
X-Custom: Test
a=1&b=2
Then you can send the actual request with this:
s = requests.Session()
s.send(prepared)
These links are to the latest documentation available, so they might change in content:
Advanced - Prepared requests and API - Lower level classes

import requests
response = requests.post('http://httpbin.org/post', data={'key1': 'value1'})
print(response.request.url)
print(response.request.body)
print(response.request.headers)
Response objects have a .request property which is the PreparedRequest object that was sent.

An even better idea is to use the requests_toolbelt library, which can dump out both requests and responses as strings for you to print to the console. It handles all the tricky cases with files and encodings which the above solution does not handle well.
It's as easy as this:
import requests
from requests_toolbelt.utils import dump
resp = requests.get('https://httpbin.org/redirect/5')
data = dump.dump_all(resp)
print(data.decode('utf-8'))
Source: https://toolbelt.readthedocs.org/en/latest/dumputils.html
You can simply install it by typing:
pip install requests_toolbelt

Note: this answer is outdated. Newer versions of requests support getting the request content directly, as AntonioHerraizS's answer documents.
It's not possible to get the true raw content of the request out of requests, since it only deals with higher level objects, such as headers and method type. requests uses urllib3 to send requests, but urllib3 also doesn't deal with raw data - it uses httplib. Here's a representative stack trace of a request:
-> r= requests.get("http://google.com")
/usr/local/lib/python2.7/dist-packages/requests/api.py(55)get()
-> return request('get', url, **kwargs)
/usr/local/lib/python2.7/dist-packages/requests/api.py(44)request()
-> return session.request(method=method, url=url, **kwargs)
/usr/local/lib/python2.7/dist-packages/requests/sessions.py(382)request()
-> resp = self.send(prep, **send_kwargs)
/usr/local/lib/python2.7/dist-packages/requests/sessions.py(485)send()
-> r = adapter.send(request, **kwargs)
/usr/local/lib/python2.7/dist-packages/requests/adapters.py(324)send()
-> timeout=timeout
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(478)urlopen()
-> body=body, headers=headers)
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(285)_make_request()
-> conn.request(method, url, **httplib_request_kw)
/usr/lib/python2.7/httplib.py(958)request()
-> self._send_request(method, url, body, headers)
Inside the httplib machinery, we can see HTTPConnection._send_request indirectly uses HTTPConnection._send_output, which finally creates the raw request and body (if it exists), and uses HTTPConnection.send to send them separately. send finally reaches the socket.
Since there's no hooks for doing what you want, as a last resort you can monkey patch httplib to get the content. It's a fragile solution, and you may need to adapt it if httplib is changed. If you intend to distribute software using this solution, you may want to consider packaging httplib instead of using the system's, which is easy, since it's a pure python module.
Alas, without further ado, the solution:
import requests
import httplib
def patch_send():
old_send= httplib.HTTPConnection.send
def new_send( self, data ):
print data
return old_send(self, data) #return is not necessary, but never hurts, in case the library is changed
httplib.HTTPConnection.send= new_send
patch_send()
requests.get("http://www.python.org")
which yields the output:
GET / HTTP/1.1
Host: www.python.org
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.1.0 CPython/2.7.3 Linux/3.2.0-23-generic-pae

requests supports so called event hooks (as of 2.23 there's actually only response hook). The hook can be used on a request to print full request-response pair's data, including effective URL, headers and bodies, like:
import textwrap
import requests
def print_roundtrip(response, *args, **kwargs):
format_headers = lambda d: '\n'.join(f'{k}: {v}' for k, v in d.items())
print(textwrap.dedent('''
---------------- request ----------------
{req.method} {req.url}
{reqhdrs}
{req.body}
---------------- response ----------------
{res.status_code} {res.reason} {res.url}
{reshdrs}
{res.text}
''').format(
req=response.request,
res=response,
reqhdrs=format_headers(response.request.headers),
reshdrs=format_headers(response.headers),
))
requests.get('https://httpbin.org/', hooks={'response': print_roundtrip})
Running it prints:
---------------- request ----------------
GET https://httpbin.org/
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
None
---------------- response ----------------
200 OK https://httpbin.org/
Date: Thu, 14 May 2020 17:16:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 9593
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
<!DOCTYPE html>
<html lang="en">
...
</html>
You may want to change res.text to res.content if the response is binary.

Here is a code, which makes the same, but with response headers:
import socket
def patch_requests():
old_readline = socket._fileobject.readline
if not hasattr(old_readline, 'patched'):
def new_readline(self, size=-1):
res = old_readline(self, size)
print res,
return res
new_readline.patched = True
socket._fileobject.readline = new_readline
patch_requests()
I spent a lot of time searching for this, so I'm leaving it here, if someone needs.

A fork of #AntonioHerraizS answer (HTTP version missing as stated in comments)
Use this code to get a string representing the raw HTTP packet without sending it:
import requests
def get_raw_request(request):
request = request.prepare() if isinstance(request, requests.Request) else request
headers = '\r\n'.join(f'{k}: {v}' for k, v in request.headers.items())
body = '' if request.body is None else request.body.decode() if isinstance(request.body, bytes) else request.body
return f'{request.method} {request.path_url} HTTP/1.1\r\n{headers}\r\n\r\n{body}'
headers = {'User-Agent': 'Test'}
request = requests.Request('POST', 'https://stackoverflow.com', headers=headers, json={"hello": "world"})
raw_request = get_raw_request(request)
print(raw_request)
Result:
POST / HTTP/1.1
User-Agent: Test
Content-Length: 18
Content-Type: application/json
{"hello": "world"}
💡 Can also print the request in the response object
r = requests.get('https://stackoverflow.com')
raw_request = get_raw_request(r.request)
print(raw_request)

I use the following function to format requests. It's like #AntonioHerraizS except it will pretty-print JSON objects in the body as well, and it labels all parts of the request.
format_json = functools.partial(json.dumps, indent=2, sort_keys=True)
indent = functools.partial(textwrap.indent, prefix=' ')
def format_prepared_request(req):
"""Pretty-format 'requests.PreparedRequest'
Example:
res = requests.post(...)
print(format_prepared_request(res.request))
req = requests.Request(...)
req = req.prepare()
print(format_prepared_request(res.request))
"""
headers = '\n'.join(f'{k}: {v}' for k, v in req.headers.items())
content_type = req.headers.get('Content-Type', '')
if 'application/json' in content_type:
try:
body = format_json(json.loads(req.body))
except json.JSONDecodeError:
body = req.body
else:
body = req.body
s = textwrap.dedent("""
REQUEST
=======
endpoint: {method} {url}
headers:
{headers}
body:
{body}
=======
""").strip()
s = s.format(
method=req.method,
url=req.url,
headers=indent(headers),
body=indent(body),
)
return s
And I have a similar function to format the response:
def format_response(resp):
"""Pretty-format 'requests.Response'"""
headers = '\n'.join(f'{k}: {v}' for k, v in resp.headers.items())
content_type = resp.headers.get('Content-Type', '')
if 'application/json' in content_type:
try:
body = format_json(resp.json())
except json.JSONDecodeError:
body = resp.text
else:
body = resp.text
s = textwrap.dedent("""
RESPONSE
========
status_code: {status_code}
headers:
{headers}
body:
{body}
========
""").strip()
s = s.format(
status_code=resp.status_code,
headers=indent(headers),
body=indent(body),
)
return s

test_print.py content:
import logging
import pytest
import requests
from requests_toolbelt.utils import dump
def print_raw_http(response):
data = dump.dump_all(response, request_prefix=b'', response_prefix=b'')
return '\n' * 2 + data.decode('utf-8')
#pytest.fixture
def logger():
log = logging.getLogger()
log.addHandler(logging.StreamHandler())
log.setLevel(logging.DEBUG)
return log
def test_print_response(logger):
session = requests.Session()
response = session.get('http://127.0.0.1:5000/')
assert response.status_code == 300, logger.warning(print_raw_http(response))
hello.py content:
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello, World!'
Run:
$ python -m flask hello.py
$ python -m pytest test_print.py
Stdout:
------------------------------ Captured log call ------------------------------
DEBUG urllib3.connectionpool:connectionpool.py:225 Starting new HTTP connection (1): 127.0.0.1:5000
DEBUG urllib3.connectionpool:connectionpool.py:437 http://127.0.0.1:5000 "GET / HTTP/1.1" 200 13
WARNING root:test_print_raw_response.py:25
GET / HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
HTTP/1.0 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 13
Server: Werkzeug/1.0.1 Python/3.6.8
Date: Thu, 24 Sep 2020 21:00:54 GMT
Hello, World!

Related

Sending images by POST using python requests

I am trying to send through an image using the following code: This is just a part of my code, I didn't include the headers here but they are set up correctly, with content-type as content-type: multipart/form-data; boundary=eBayClAsSiFiEdSpOsTiMaGe
img = "piano.jpg"
f = open(img,'rb')
out = f.read()
files = {'file':out}
p = requests.post("https://ecg-api.gumtree.com.au/api/pictures",headers=headers, data=files)
f.close()
I get a 400 error incorrect multipart/form-data format
How do I send the image properly?
Extra Details:
Network analysis shows the following request been sent:
POST https://ecg-api.gumtree.com.au/api/pictures HTTP/1.1
host: ecg-api.gumtree.com.au
content-type: multipart/form-data; boundary=eBayClAsSiFiEdSpOsTiMaGe
authorization: Basic YXV5grehg534
accept: */*
x-ecg-ver: 1.49
x-ecg-ab-test-group: gblios_9069_b;gblios-8982-b
accept-encoding: gzip
x-ecg-udid: 73453-7578p-8657
x-ecg-authorization-user: id="1635662", token="ee56hgjfjdghgjhfj"
accept-language: en-AU
content-length: 219517
user-agent: Gumtree 12.6.0 (iPhone; iOS 13.3; en_AU)
x-ecg-original-machineid: Gk435454-hhttehr
Form data:
file: ����..JFIF.....H.H..��.LExif..MM.*..................�i.........&......�..
I cut off the the formdata part for file as its too long. My headers are written as follows (I have made up the actual auth values here):
idd = "1635662"
token = "ee56hgjfjdghgjhfj"
headers = {
"authority":"ecg-api.gumtree.com.au",
"content-type":"multipart/form-data; boundary=eBayClAsSiFiEdSpOsTiMaGe",
"authorization":"Basic YXV5grehg534",
"accept":"*/*",
"x-ecg-ver":"1.49",
"x-ecg-ab-test-group":"gblios_9069_b;gblios-8982-b",
"accept-encoding":"gzip",
"x-ecg-udid":"73453-7578p-8657",
"x-ecg-authorization-user":f"id={idd}, token={token}",
"accept-language":"en-AU",
"content-length":"219517",
"user-agent":"Gumtree 12.6.0 (iPhone; iOS 13.3; en_AU)",
"x-ecg-original-machineid":"Gk435454-hhttehr"
}
Maybe its the way I have written the headers? I suspect its the way I have written the x-ecg-authorization-user part in headers? Because I realise even putting random values for the token or id gives me the same 400 error incorrect multipart/form-data format
You can try the following code. Don't set content-type in the headers.Let Pyrequests do that for you
files = {'file': (os.path.basename(filename), open(filename, 'rb'), 'application/octet-stream')}
upload_files_url = "url"
headers = {'Authorization': access_token, 'JWTAUTH': jwt}
r2 = requests.post(parcels_files_url, files=files, headers=headers)

HTTP post Json 400 Error

I am trying to post data to my server from my microcontroller. I need to send raw http data from my controller and this is what I am sending below:
POST /postpage HTTP/1.1
Host: https://example.com
Accept: */*
Content-Length: 18
Content-Type: application/json
{"cage":"abcdefg"}
My server requires JSON encoding and not form encoded request.
For the above request sent, I get an 400 error from the server, HTTP/1.1 400 Bad Request
However, when I try to reach the post to my server via a python script via my laptop, I am able to get a proper response.
import requests
url='https://example.com'
mycode = 'abcdefg'
def enter():
value = requests.post('url/postpage',
params={'cage': mycode})
print vars(value)
enter()
Can anyone please let me know where I could be going wrong in the raw http data I'm sending above ?
HTTP specifies the separator between headers as a single newline, and requires a double newline before the content:
POST /postpage HTTP/1.1
Host: https://example.com
Accept: */*
Content-Length: 18
Content-Type: application/json
{"cage":"abcdefg"}
If you don’t think you’ve got all of the request right, try seeing what was sent by Python:
response = ...
request = response.request # request is a PreparedRequest.
headers = request.headers
url = request.url
Read the docs for PreparedRequest for more information.
To pass a parameter, use this Python:
REQUEST = 'POST /postpage%s HTTP/1.1\r\nHost: example.com\r\nContent-Length: 0\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.4.3 CPython/2.7.9 Linux/4.4.11-v7+\r\n\r\n';
query = ''
for k, v in params.items():
query += '&' + k + '=' + v # URL-encode here if you want.
if len(query): query = '?' + query[1:]
return REQUEST % query

URL rules of python requests

Recently I ported my client upload code from HTTPConnection to requests. On uploading a image:
file_name ='/path/to/216169_1286900924tait.jpg?pt=5&ek=1'
The image stored on disk is really the name, and I want to upload it to remote server with same path and name, so I constructed the request like this:
url = 'http://host/bucket_name/%s' % (file_name)
headers = {...} # some other headers
with open(file_name, 'rb') as fd:
data = fd.read()
r = requests.put(url, data=data, headers=headers)
assert(r.status_code==200)
....
But the request send to server changed to this:
/path/to/216169_1286900924tait.jpg
requests should encode the tail as %3Fpt%3D5%26ek%3D1, but it seems that requests do nothing on url-encode of url, I think it may matched the ?pt=5&ek=1 pattern to request parameters, how to make requests convert urls blindly without the pattern match?
Update:
Server get the trimmed url and calculated the signature with it, and so do not match the signature I calculated with original url, so 403 returned.
You might have a problem on the way how you construct the URL:
>>> payload = {'pt': 5, 'ek': '1'}
>>> r = requests.get('http://host/bucket_name/file_name', params=payload)
if you call the print(r.url) you should have the right form.
Why should requests presume to encode the query parameters? It does not know that you don't want that part of the URL treated as the query string. Besides, the request is sent as is to the server, the query string is not omitted as you suggest. You can verify that with nc:
# run nc server
$ nc -l 1234
# then send request from Python
>>> requests.put('http://localhost:1234/path/to/216169_1286900924tait.jpg?pt=5&ek=1', data='any old thing')
nc will display the request:
PUT /path/to/216169_1286900924tait.jpg?pt=5&ek=1 HTTP/1.1
Host: localhost:1234
Content-Length: 13
User-Agent: python-requests/2.9.1
Connection: keep-alive
Accept: */*
Accept-Encoding: gzip, deflate
any old thing
So it is the remote server that is (correctly according to the HTTP protocol) interpreting the ?pt=5&ek=1 part of the file name as query parameters. What else should it do?
For comparison, since I assume that it previously worked with httplib.HTTPConnection:
>>> import httplib
>>> r = httplib.HTTPConnection('localhost', 1234)
>>> r.request('PUT', '/path/to/216169_1286900924tait.jpg?pt=5&ek=1', 'hello from httplib')
generates this request:
PUT /path/to/216169_1286900924tait.jpg?pt=5&ek=1 HTTP/1.1
Host: localhost:1234
Accept-Encoding: identity
Content-Length: 18
hello from httplib
Note that there is no difference in the way the URL is sent.
I dig into the requests source code, I find the following line of code(yes, requests based on urllib3):
scheme, auth, host, port, path, query, fragment = urllib3.util.parse_url(url)
It seems like that, you should url-encode your url manually during construct your url string, for example:
>>> path = '''~!##$^&*()_+|}{":?><`-=\\][\';.,'''
>>> url = 'http://host.com/bucket/%s' % path
>>> urllib3.util.parse_url(url)
>>> Url(scheme='http', auth=None, host='host.com', port=None, path='/bucket/~!#', query=None, fragment='$^&*()_+|}{":?><`-=B%7C%7D%7B%22%3A%3F%3E%3C%60-%3D%5C%5D%5B%27%3B.%2C')
Notice the path field output, not the same as path, if you encode path:
>>> path = '''~!##$^&*()_+|}{":?><`-=\\][\';.,'''
>>> url = 'http://host.com/bucket/%s' % (urllib.quote(path, ''))
>>> print url
>>> http://host.com/bucket/%7E%21%40%23%24%25%5E%26%2A%28%29_%2B%7C%7D%7B%22%3A%3F%3E%3C%60-%3D%5C%5D%5B%27%3B.%2C
>>> urllib3.util.parse_url(url)
>>> Url(scheme='http', auth=None, host='host.com', port=None, path='/bucket/%7E%21%40%23%24%25%5E%26%2A%28%29_%2B%7C%7D%7B%22%3A%3F%3E%3C%60-%3D%5C%5D%5B%27%3B.%2C', query=None, fragment=None)
this what I want. But if you want to pass tome unicode characters into path, you do not need to encode them, they were automatically transferred into %xx%xx format. But url encode is a good advise for any characters you passed into URL.

python parse http response (string)

I'm using python 2.7 and I want to parse string HTTP response fields which I already extracted from a text file. What would be the easiest way? I can parse requests by using the BaseHTTPServer but couldn't manage to find something for the responses.
The responses I have are pretty standard and in the following format
HTTP/1.1 200 OK
Date: Thu, Jul 3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626
Thanks in advance,
You might find this useful, keep in mind that HTTPResponse wasn't designed to be "instantiated directly by user."
Also note that the content-length header in your response string may not be valid any more (it depends on how you've aquired these responses) this just means that the call to HTTPResponse.read() needs to have value larger than the content in order to get it all.
In python 2 it can be run this way.
from httplib import HTTPResponse
from StringIO import StringIO
http_response_str = """HTTP/1.1 200 OK
Date: Thu, Jul 3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626"""
class FakeSocket():
def __init__(self, response_str):
self._file = StringIO(response_str)
def makefile(self, *args, **kwargs):
return self._file
source = FakeSocket(http_response_str)
response = HTTPResponse(source)
response.begin()
print "status:", response.status
print "single header:", response.getheader('Content-Type')
print "content:", response.read(len(http_response_str)) # the len here will give a 'big enough' value to read the whole content
In python 3, the HTTPResponse is imported from http.client, and the response to be parsed needs to be byte encoded. Depending on where the data is gotten from this may be done already or need to be called explicitly
from http.client import HTTPResponse
from io import BytesIO
http_response_str = """HTTP/1.1 200 OK
Date: Thu, Jul 3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626
teststring"""
http_response_bytes = http_response_str.encode()
class FakeSocket():
def __init__(self, response_bytes):
self._file = BytesIO(response_bytes)
def makefile(self, *args, **kwargs):
return self._file
source = FakeSocket(http_response_bytes)
response = HTTPResponse(source)
response.begin()
print( "status:", response.status)
# status: 200
print( "single header:", response.getheader('Content-Type'))
# single header: text/xml; charset="utf-8"
print( "content:", response.read(len(http_response_str)))
# content: b'teststring'
You might want to consider using python-requests.
Link: http://docs.python-requests.org/en/latest/
Here is an example from http://dancallahan.info/journal/python-requests/
Considering your responses are compliant with HTTP RFC
Does this look like something you want to do?
>>> import requests
>>> url = 'http://example.test/'
>>> response = requests.get(url)
>>> response.status_code
200
>>> response.headers['content-type']
'text/html; charset=utf-8'
>>> response.content
u'Hello, world!'

'urllib2.urlopen' adding Host header

I'm using Observium to pull Nginx stats on localhost however it returns '405 Not Allowed':
# curl -I localhost/nginx_status
HTTP/1.1 405 Not Allowed
Server: nginx
Date: Wed, 19 Jun 2013 22:12:37 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 166
Connection: keep-alive
Keep-Alive: timeout=5
# curl -I -H "Host: example.com" localhost/nginx_status
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 19 Jun 2013 22:12:43 GMT
Content-Type: text/plain
Connection: keep-alive
Keep-Alive: timeout=5
Could you please advise how to add Host header with 'urllib2.urlopen' in Python (Python 2.6.6
):
Current script:
#!/usr/bin/env python
import urllib2
import re
data = urllib2.urlopen('http://localhost/nginx_status').read()
params = {}
for line in data.split("\n"):
smallstat = re.match(r"\s?Reading:\s(.*)\sWriting:\s(.*)\sWaiting:\s(.*)$", line)
req = re.match(r"\s+(\d+)\s+(\d+)\s+(\d+)", line)
if smallstat:
params["Reading"] = smallstat.group(1)
params["Writing"] = smallstat.group(2)
params["Waiting"] = smallstat.group(3)
elif req:
params["Requests"] = req.group(3)
else:
pass
dataorder = [
"Active",
"Reading",
"Writing",
"Waiting",
"Requests"
]
print "<<<nginx>>>\n";
for param in dataorder:
if param == "Active":
Active = int(params["Reading"]) + int(params["Writing"]) + int(params["Waiting"])
print Active
else:
print params[param]
You might want to check out the urllib2 missing manual for more information, but basically you create a dictionary of your header labels and values and pass it to the urllib2.Request method. A (slightly) modified version of the code from the linked manual:
from urllib import urlencode
from urllib2 import Request urlopen
# Define values that we'll pass to our urllib and urllib2 methods
url = 'http://www.something.com/blah'
user_host = 'example.com'
values = {'name' : 'Engineero', # dict of keys and values for our POST data
'location' : 'Interwebs',
'language' : 'Python' }
headers = { 'Host' : user_host } # dict of keys and values for our header
# Set up our request, execute, and read
data = urlencode(values) # encode for sending URL request
req = Request(url, data, headers) # make POST request to url with data and headers
response = urlopen(req) # get the response from the server
the_page = response.read() # read the response from the server
# Do other stuff with the response

Categories