While using requests to download a webpage, we store the result of that operation in a response object. What I could not understand is, exactly what is stored in the response object? Is it the source code of that page in HTML or is it the entire string on the page that is stored?
It is an instance of the lower level Response class of the python requests library. The literal description from the documentation is..
The Response object, which contains a server's response to an HTTP request.
Every HTTP request sent returns a response from the server (the Response object) which includes quite a bit of information.
You can find all the info you need here, and also here is the github link.
Server and Client use HTTP Protocol to send/receive information.
response stores all information from server - HTTP headers (for example: cookies, status code) and HTTP body (mostly HTML but it can be JSON or file or other)
wikipedia: HTTP Protocol
BTW: request stores HTTP headers and HTTP body too. (sometimes HTTP body can be empty)
Related
I am trying to make a request to a webpage using the code below and i am getting response 444.
Is there anything i can to about it?
import requests
url = "https://www.pseudosite.com/"
response = requests.get(url)
print(response) # <Response [444]>
http.dev website says the following:
When the 444 No Response status code is generated, the server returns
no information to the client and closes the HTTP Connection. This
error message can be found in the nginx logs and will not be sent to
the client. It is useful for dealing with malicious HTTP requests,
such as one that includes an illegal Host header.
I am trying to webscrape that website using python, but I am blocked at first step.
I believe that you need to add headers to view this webpage. If you open devtools on your browser, then you should see a 'get' request; if you now click on the 'Headers' tab, you can create a dictionary with the data.
I am trying to write an azure function to manage SSO between two services. The first one will host the link to the HTTP triggered Azure Function which then should respond with the formatted SAML Response which then gets sent to the consumption URL as a POST, but I can only make GET requests with the azure.functions.HttpResponse method needed to parse outputs for Azure Functions (unless I'm wrong).
Alternatively I've tried to set the cookie that I get as a response from sending the SAML Response with the python requests method, but the consumption URL doesn't seem to care that the cookie is there and just brings me back to the login page.
The SP in this situation is Absorb LMS and I can confirm that the SAML Response is formatted correctly because submitting it from an HTTP form works fine (which I've also tried returning as the body of the azure.functions.HttpResponse, but I just get HTTP errors which I can't make heads or tails of).
import requests
import azure.functions as func
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
body = {"SAMLResponse": *b64 encoded saml response and assertion*}
response = requests.post(url=*acs url*, headers=headers, data=body)
headers = response.headers
headers['Location'] = *acs url*
return func.HttpResponse(headers=headers, status_code=302)
I am trying to login into a website by passing username and password.It says session cookie is missing.I am beginner to api .I dont know if I have missed something here.The website is http://testing-ground.scraping.pro/login
import urllib3
http = urllib3.PoolManager()
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = http.request('POST', url, fields={'usr':'admin','pwd':'12345'})
print(req.data.decode('utf-8'))
There are two issues in your code that make you unable to log in successfully.
The content-type issue
In the code you are using urllib3 to send data of content-type multipart/form-data. The website, however, seems to only accept the content-type application/x-www-form-urlencoded.
Try the following cURL commands:
curl -v -d "usr=admin&pwd=12345" http://testing-ground.scraping.pro/login?mode=login
curl -v -F "usr=admin&pwd=12345" http://testing-ground.scraping.pro/login?mode=login
For the first one, the content-type in your request header is application/x-www-form-urlencoded, so the website takes it and logs you in (with a 302 Found response).
The second one, however, sends data with content-type multipart/form-data. The website doesn't take it and therefore rejects your login request (with a 200 OK response).
The cookie issue
Another issue is that urllib3 follows redirect by default. More importantly, the cookie is not handled (i.e. stored and sent in the following requests) by default by urllib3. Thus, the second request won't contain the cookie tdsess=TEST_DRIVE_SESSION, and therefore the website returns the message that you're not logged in.
If you only care about the login request, you can try the following code:
import urllib3
http = urllib3.PoolManager()
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = http.request('POST', url, data={'usr':'admin','pwd':'12345'}, encode_multipart=False, redirect=False)
print(req.data.decode('utf-8'))
The encode_multipart=False instructs urllib3 to send data with content-type application/x-www-form-urlencoded; the redirect=False tells it not to follow the redirect, so that you can see the response of your initial request.
If you do want to complete the whole login process, however, you need to save the cookie from the first response and send it in the second request. You can do it with urllib3, or
Use the Requests library
I'm not sure if you have any particular reasons to use urllib3. Urllib3 will definitely work if you implements it well, but I would suggest try the Request library, which is much easier to use. For you case, the following code with Request will work and get you to the welcome page:
import requests
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = requests.post(url, data={'usr':'admin','pwd':'12345'})
print(req.text)
import requests
auth_credentials = ("admin", "12345")
url = "http://testing-ground.scraping.pro/login?mode=login"
response = requests.post(url=url, auth=auth_credentials)
print(response.text)
I'm sending a POST request, with python-requests in Python 3.5, using:
r = requests.post(apiEndpoint, data=jsonPayload, headers=headersToMergeIn)
I can inspect the headers and body after sending the request like this:
print(r.request.headers)
print(r.request.body)
Is there any way to inspect the full request headers (not just the ones i'm merging in) and body before sending the request?
Note: I need this for an api which requires me to build a hash off of a subset of the headers, and another hash off the full body.
You probably want Prepared Requests
The following is the code I use,
import unshortenit
unshortened_uri,status = unshortenit.unshorten('http://4sq.com/1iyfyI5')
print unshortened_uri
print status
The following is the output:
https://foursquare.com/edtechschools/checkin/53ac1e5f498e5d8d736ef3be?s=BlinbPzgFfShr0vdUnbEJUnOYYI&ref=tw
Invalid URL u'/tanyaavrith/checkin/53ac1e5f498e5d8d736ef3be?s=BlinbPzgFfShr0vdUnbEJUnOYYI&ref=tw': No schema supplied
whereas if I use the same url in browser, it correctly redirects to the actual url. Any idea why its not working ?
There's a 301 redirect chain:
From:
'http://4sq.com/1iyfyI5'
To:
'https://foursquare.com/edtechschools/checkin/53ac1e5f498e5d8d736ef3be?s=BlinbPzgFfShr0vdUnbEJUnOYYI&ref=tw'
To:
'/tanyaavrith/checkin/53ac1e5f498e5d8d736ef3be?s=BlinbPzgFfShr0vdUnbEJUnOYYI&ref=tw'
unshortenit use requests, and requests can't understand the last relative url.
Updates:
Actually, request lib can handle http redirects well and automatically with request.get method.
e.g.
import requests
r=requests.get('http://4sq.com/1iyfyI5')
r.status_code # 200
r.url # u'https://foursquare.com/tanyaavrith/checkin/53ac1e5f498e5d8d736ef3be?s=BlinbPzgFfShr0vdUnbEJUnOYYI&ref=tw'
But unshortenit does not want the overhead of HTTP GET, instead it uses HTTP HEAD. If the response of the HTTP Head request has a 'Location'field in its header, unshortenit makes a new HTTP HEAD request to that location. The new request is isolated from the original request, and relative url doesn't work any more.
Reference (from Wikipedia):
While the obsolete IETF standard RFC 2616 (HTTP 1.1) requires a
complete absolute URI for redirection, the most popular web
browsers tolerate the passing of a relative URL as the value for a
Location header field. Consequently, the current revision of HTTP/1.1
makes relative URLs conforming