Due to requests adding unwanted headers, I decided to prepare the request manually and use Session Send().
Sadly, The following code produces the wrong request
import requests
ARCHIVE_URL = "http://10.0.0.10/post/tmp/archive.zip"
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'Cache-Control': 'no-cache',
'Connection': 'Keep-Alive',
'Host': '10.0.0.10'
}
DataToSend = 'data'
req = requests.Request('POST', ARCHIVE_URL, data=DataToSend, headers=headers)
prepped = req.prepare()
s = requests.Session()
response = s.send(prepped)
If I look at the request using fiddler I get this:
GET http://10.0.0.10/tmp/archive.zip HTTP/1.1
Accept-Encoding: identity
Connection: Keep-Alive
Host: 10.0.0.10
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded
What am I missing?
since prepared request is not connected to the session when using req.prepare() instead of s.prepare_request(req) when s is the session , you must specify request headers since there are no default one that come from the session object.
use s.prepare_request(req) instead of req.prepare() or specify headers dictionary
Related
I am trying to access a site with a bot prevention.
with the following script using requests I can access the site.
request = requests.get(url,headers={**HEADERS,'Cookie': cookies})
and I am getting the desired HTML. but when I use aiohttp
async def get_data(session: aiohttp.ClientSession,url,cookies):
async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
text = await response.text()
print(text)
I am getting as a response the bot prevention page.
This is the headers I use for all the requests.
HEADERS = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Host': 'www.dnb.com',
'Connection': 'keep-alive',
'Accept': '/',
'Accept-Encoding': 'gzip, deflate, br'
}
I have compared the requests headers both of requests.get and aiohttp and they are identical.
is there any reason the results are different? if so why?
EDIT: I've checked the httpx module, the problem occurs there aswell both with httpx.Client() and httpx.AsyncClient().
response = httpx.request('GET',url,headers={**HEADERS,'Cookie':cookies})
doesn't work as well. (not asyncornic)
I tried capturing packets with wireshark to compare requests and aiohttp.
Server:
import http
server = http.server.HTTPServer(("localhost", 8080),
http.server.SimpleHTTPRequestHandler)
server.serve_forever()
with requests:
import requests
url = 'http://localhost:8080'
HEADERS = {'Content-Type': 'application/json'}
cookies = ''
request = requests.get(url,headers={**HEADERS,'Cookie': cookies})
requests packet:
GET / HTTP/1.1
Host: localhost:8080
User-Agent: python-requests/2.27.1
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
Content-Type: application/json
Cookie:
with aiohttp:
import aiohttp
import asyncio
url = 'http://localhost:8080'
HEADERS = {'Content-Type': 'application/json'}
cookies = ''
async def get_data(session: aiohttp.ClientSession,url,cookies):
async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
text = await response.text()
print(text)
async def main():
async with aiohttp.ClientSession() as session:
await get_data(session,url,cookies)
asyncio.run(main())
aiohttp packet:
GET / HTTP/1.1
Host: localhost:8080
Content-Type: application/json
Cookie:
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Python/3.10 aiohttp/3.8.1
If the site seems to accept packets from requests, then you could try making the aiohttp packet identical by setting the headers:
HEADERS = { 'User-Agent': 'python-requests/2.27.1','Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/json','Cookie': ''}
If you haven't already, I suggest capturing the request with wireshark to make sure aiohttp isn't messing with your headers.
You can also try other user agent strings too, or try the headers in different orders. The order is not supposed to matter, but some sites check it anyway for bot protection (for example in this question).
I'm able to succesfully POST some csv data to an external server using the httr package in R using a post request like so:
request <- POST(url, body = upload_file('my_table.csv'), verbose())
The detail provided by the verbose() option above tells me that the post request headers look like this:
-> User-Agent: libcurl/7.68.0 r-curl/4.3.1 httr/1.4.2
-> Accept-Encoding: deflate, gzip, br
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: text/csv
-> Content-Length: 77
->
What I'm trying to do is emulate this code with the Python requests module (because the rest of the package is in Python), and I'm using the following code:
response = requests.post(url, files = {'file': open('my_table.csv','rb')})
However, I'm getting a error,Error: File type not supported error from the server when I do so. When I look at the details of my POST request in Python, I see the following headers:
{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '226', 'Content-Type': 'multipart/form-data; boundary=b8f99a72145547743d035be5d9c1e983'}
What is the cleanest way for me to upload and post this CSV data such that the server might accept it?
I'm attempting to use this API endpoint to upload a file:
https://h.app.wdesk.com/s/cerebral-docs/?python#uploadfileusingpost
With this python function:
def upload_file(token, filepath, table_id):
url = "https://h.app.wdesk.com/s/wdata/prep/api/v1/file"
headers = {
'Accept': 'application/json',
'Authorization': f'Bearer {token}'
}
files = {
"tableId": (None, table_id),
"file": open(filepath, "rb")
}
resp = requests.post(url, headers=headers, files=files)
print(resp.request.headers)
return resp.json()
The Content-Type and Content-Length headers are computed and added by the requests library internally as per their documentation. When assigning to the files kwarg in the post function, the library knows it's supposed to be a multipart/form-data request.
The print out of the request header is as follows, showing the Content-Type and Content-Length that the library added. I've omitted the auth token.
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive',
'Authorization': 'Bearer <omitted>', 'Content-Length': '8201', 'Content-Type': 'multipart/form-data; boundary=bb582b9071574462d44c4b43ec4d7bf3'}
The json response from the API is:
{'body': ['contentType must not be null'], 'code': 400}
The odd thing is that the same request, when made through Postman, gives a different response - which is what I expected from Python as well.
{ "code": 409, "body": "duplicate file name" }
These are the Postman request headers:
POST /s/wdata/prep/api/v1/file HTTP/1.1
Authorization: Bearer <omitted>
Accept: */*
Cache-Control: no-cache
Postman-Token: 34ed08d4-4467-4168-a4e4-c83b16ce9afb
Host: h.app.wdesk.com
Content-Type: multipart/form-data; boundary=--------------------------179907322036790253179546
Content-Length: 8279
The Postman request also computes the Content-Type and Content-Length headers when the request is sent, and are not user specified.
I am quite confused as to why I'm getting two different behaviors from the API service for the same request.
There must be something I'm missing and can't figure out what it is.
Figured out what was wrong with my request, compared to NodeJS and Postman.
The contentType being referred to in the API's error message was the file parameter's content type, not the http request header Content-Type.
The upload started to work flawlessly when I updated my file parameter like so:
files = {
"tableId": (None, table_id),
"file": (Path(filepath).name, open(filepath, "rb"), "text/csv", None)
}
I learned that Python's requests library will not automatically add the file's mime type to the request body. We need to be explicit about it.
Hope this helps someone else too.
Using session from requests module in python, it seems that the session sends authorization only with first request, I can't understand why this happened.
import requests
session = requests.Session()
session.auth = (u'user', 'test')
session.verify = False
response = session.get(url='https://my_url/rest/api/1.0/users')
If I look for this response request headers I see:
{'Authorization': 'Basic auth_data', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.12.3'}
but if I send next request using the same or not url:
response = session.get(url='https://my_url/rest/api/1.0/users')
I can see that there is no auth header in request anymore:
print response.request.headers
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.12.3'}
And I'm getting 401 response because of it.
Why is it so? Shouldn't session send auth with every request made using it?
How can I send auth data with every request using session?
What I see when I run that exact code in your comment is that the Authorization header is missing in the first print, yet it is present in the second. This seems to be the opposite of the problem that you report.
This is explained by the fact that the first request is redirected by a 301 response, and the auth header is not propagated in the follow up request to the redirected location. You can see that the auth header was sent in the initial request by looking in response.history[0].request.headers.
The second request is not redirected because the session has kept the connection to the host open (due the the Connection: keep-alive header), so the auth headers appear when you print response.request.headers.
I doubt that you are actually using https://test.com, but probably a similar thing is happening with the server that you are using.
For testing I recommend using the very handy public test HTTP server https://httpbin.org/headers. This will return the headers received by the server in the response body. You can test redirected requests with one of the redirect URLs.
I didn't find any reliable answer on how to pass, auth info while making request's session in python. So below is my finding:
with requests.sessions.Session() as session:
session.auth = ("username", "password")
# Make any requests here without provide auth info again
session.get("http://www.example.com/users")
I'm trying to upload an image using requests on python.
This is what I send using browser
POST /upload-photo/{res1}/{res2}/{res3}/ HTTP/1.1
Host: tgt.tgdot.com
Connection: keep-alive
Content-Length: 280487
Authorization: Basic {value}=
Accept: */*
Origin: http://tgt.tgdot.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryA8sGeB48ZZCvG127
Referer: http://tgt.tgdot.com/{res1}/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: fttoken={cookie_value}
This is my code
with open(os.getcwd()+"/images/thee1.JPG", "rb") as image_file:
encoded_image = base64.b64encode(image_file.read())
headers = {"Content-Type":"multipart/form-data", "Authorization":"Basic " + authvalue}
cookie = {cookiename: token.value}
r = requests.post(url, headers =headers, cookies = cookie, params=encoded_image)
print r.request.headers
print r.status_code
print r.text
I keep getting 414 Request-URI Too Large
I'm not sure what's missing here. I would really appreciate help
You are encoding the whole image into the request parameters, effectively extending the URL by the length of the image.
If you already encoded the image data, use the data parameter:
r = requests.post(url, headers=headers, cookies=cookie, data=encoded_image)
Note that requests can encode multipart/form-data POST bodies directly, there is no need for you to encode it yourself. Use the files parameter in that case, passing in a dictionary or sequence of tuples. See the POST Multiple Multipart-Encoded Files section of the documentation.
The library can also handle a username and password pair to handle the Authorization header; simply pass in a (username, password) tuple for the auth keyword argument.
Encoding an image to Base64 is not sufficient however. Your content-type header and your POST payload are not matching. You'd instead post the file with a field name:
with open(os.getcwd()+"/images/thee1.JPG", "rb") as image_file:
files = {'field_name': image_file}
cookie = {cookiename: token.value}
r = requests.post(url, cookies = cookie, files=files, auth=(username, password)