Getting response 444 when making request to a webpage - python

I am trying to make a request to a webpage using the code below and i am getting response 444.
Is there anything i can to about it?
import requests
url = "https://www.pseudosite.com/"
response = requests.get(url)
print(response) # <Response [444]>
http.dev website says the following:
When the 444 No Response status code is generated, the server returns
no information to the client and closes the HTTP Connection. This
error message can be found in the nginx logs and will not be sent to
the client. It is useful for dealing with malicious HTTP requests,
such as one that includes an illegal Host header.
I am trying to webscrape that website using python, but I am blocked at first step.

I believe that you need to add headers to view this webpage. If you open devtools on your browser, then you should see a 'get' request; if you now click on the 'Headers' tab, you can create a dictionary with the data.

Related

View headers and body of a POST request made by Python Script

In my application, I have my API that is in localhost:8000/api/v0.1/save_with_post.
I've also made a Python Script in order to do a Post Request on such Api.
### My script
import requests
url = 'localhost:8000/api/v0.1/save_with_post'
myobj = {'key': 'value'}
x = requests.post(url, data = myobj)
Is it possible to view headers and body of the request in Chrome rather than debugging my application code?
You want Postman.
With Postman you can either generate a request to your service from Postman itself, or set up Postman as a proxy so you can see the requests that your API client is generating and the responses from the server.
If you want to view the response headers from the post request, have you tried:
>>> x.headers
Or you could just add headers yourself to your POST request as so:
h = {"Content-Type": "application/xml", ("etc")}
x = requests.post(url, data = myobj, headers = h)
well, I don't know if there is a way you could view the request in Chrome DevTools directly (I don't think there is) however, I know there are two alternatives for seeing the request body and response:
1 - use selenium with a chrome webdriver
this will allow you to run chrome automated by python. then you can open a test page and run javascript in it to do your post request,
see this for more info on how to do this:
1 https://selenium-python.readthedocs.io/getting-started.html
2 Getting the return value of Javascript code in Selenium
you will need to use Selenium-requests library to use requests library with selenium
https://pypi.org/project/selenium-requests/3
2 - use Wireshark
this program will allow you to see all the traffic that is going on your network card and therefore you will be able to monitor all the requests going back and forth. however, Wireshark will throw all the traffic that you network card send or receives it may be hard to see the specific request you want

Python get requests for an API URL returns 422 error but on browser no problems. Potential service worker problem?

I have noticed that for some websites' API Urls, the return on the browser is via a service worker which has caused problems in scraping those APIs.
For consider the following:
https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand
The data appears when the url is pasted into a browser However it gives me a 422 error when I try to automate the collection of that data in Python with the following code:
import requests
#API url
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
#The response is always 422
response = requests.get(url)
I have noticed that when calling the API url on the browser returns a response via a service worker. Therefore my questions is there a way around to get a 200 response via the python requests library?
The server appears to require the Accept-Language header.
The code below now returns 200.
import requests
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
headers = {'Accept-Language': 'en-gb'}
response = requests.get(url, headers=headers)
(Ascertained by checking a successful request via a browser, adding in all headers AS IS to the python request and then removing one by one.)

http request shows invalid information python

I'm trying to get data from
https://www.biman-airlines.com/bookings/flight_selection.aspx
For example, when I choose flight from Dhaka(DAC) to Sylhet(ZYL), it goes to
https://www.biman-airlines.com/bookings/flight_selection.aspx?TT=RT&SS=&RT=&FL=on&DC=DAC&AC=ZYL&AM=2018-01&AD=09&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&RM=2018-01&RD=10&PA=1&PT=&PC=&PI=&CC=&NS=&CD=&FS=B4B9631
and shows the flight information
but when I'm trying to perform such get request using python, it shows no info
here is my code:
import requests
print(requests.get('https://www.biman-airlines.com/bookings/flight_selection.aspx?TT=RT&SS=&RT=&FL=on&DC=DAC&AC=ZYL&AM=2018-01&AD=09&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&RM=2018-01&RD=10&PA=1&PT=&PC=&PI=&CC=&NS=&CD=&FS=').text)
What am I doing wrong?
thanks in advance for any help
but when I'm trying to perform such get request using python, it shows no info. What am I doing wrong?
The request result shows no info because there is no cookie data in the python HTTP request.
If you check the HTTP request in browser debug window, you can see there is cookie along with the request -- the cookie identifies who the client is and tells server "Hi, server, I'm a valid user":
With reasonable guess, in this biman-airlines.com case, the server would check the cookie and return result only if the cookie is valid.
Thus, you need to add your Cookie header in the python code:
# The cookie below is just for example, you would get your own cookie once visiting the website.
headers = {
'Cookie': 'chocolateChip=nbixfy44dvziejjdxd2wmzs3; BNI_bg_zapways=0000000000000000000000009301a8c000005000; ASPSESSIONIDSQDCSSDT=PFJPADACFOGBDMONPBHPMFJN'
}
print(requests.get('https://www.biman-airlines.com/bookings/flight_selection.aspx?TT=RT&SS=&RT=&FL=on&DC=DAC&AC=ZYL&AM=2018-01&AD=09&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&DC=&AC=&AM=&AD=&RM=2018-01&RD=10&PA=1&PT=&PC=&PI=&CC=&NS=&CD=&FS=B4B9631', headers=headers).text)

Best way to ignore SSL certificate with request+python

I have a application deployed in private server.
ip = "100.10.1.1"
I want to read the source code/Web content of that page.
On browser when I am visiting to the page. It shows me "Your connection is not private".
So then after proceed to unsafe connection it takes me to that page.
Below is the code I am trying to read the HTML content. But it is not giving me the correct HTML content thought it is showing as 200 OK response.
I tried with ignoring ssl certificate with below code but not happening
import requests
url = "http://100.10.1.1"
requests.packages.urllib3.disable_warnings()
response = requests.get(url, verify=False)
print response.text
Can Someone share some idea on how to proceed or am i doing anything wrong on top of it?

python requests.get() keep getting 401

The task I want to complete is very simple. To do a http get request using python.
Below is the code I used:
url = 'http://www.costcobusinessdelivery.com/AjaxWarehouseBrowseLookupView?storeId=11301&catalogId=11701&langId=-1&parentGeoNode=10112'
requests.get(url)
Then I got:
<Response [401]>
I am new to python, can someone help? Thanks!
Update:
Based on the comments. It seems the code is okay, but I do get the 401 response. I doubt my company's network has some restrictions? But I can access and get a valid response through a browser. Is there a way to bypass my company's firewall/proxy or whatever? Just to pretend that I am using a browser in python? Thanks again!
If your browser is accessing the web via a proxy server, look that up on your browser settings and use that in python.
r = requests.get(url,
proxies={"http": "http://61.233.25.166:80"})
your proxy server will have a different address.

Categories