I am trying to post multipart/form data using requests library according to website on submiting the form you are redirected to page where your data is created but when I am trying using requests library it gives 200 as response instead it should give 302 as response please could any one help me in this i dont know what i am doing wrong
By default requests will follow "302" redirection responses. You can disable this as follows:
r = requests.get('http://github.com/', allow_redirects=False)
See https://requests.kennethreitz.org/en/master/user/quickstart/#redirection-and-history
Related
I am trying to login into a website by passing username and password.It says session cookie is missing.I am beginner to api .I dont know if I have missed something here.The website is http://testing-ground.scraping.pro/login
import urllib3
http = urllib3.PoolManager()
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = http.request('POST', url, fields={'usr':'admin','pwd':'12345'})
print(req.data.decode('utf-8'))
There are two issues in your code that make you unable to log in successfully.
The content-type issue
In the code you are using urllib3 to send data of content-type multipart/form-data. The website, however, seems to only accept the content-type application/x-www-form-urlencoded.
Try the following cURL commands:
curl -v -d "usr=admin&pwd=12345" http://testing-ground.scraping.pro/login?mode=login
curl -v -F "usr=admin&pwd=12345" http://testing-ground.scraping.pro/login?mode=login
For the first one, the content-type in your request header is application/x-www-form-urlencoded, so the website takes it and logs you in (with a 302 Found response).
The second one, however, sends data with content-type multipart/form-data. The website doesn't take it and therefore rejects your login request (with a 200 OK response).
The cookie issue
Another issue is that urllib3 follows redirect by default. More importantly, the cookie is not handled (i.e. stored and sent in the following requests) by default by urllib3. Thus, the second request won't contain the cookie tdsess=TEST_DRIVE_SESSION, and therefore the website returns the message that you're not logged in.
If you only care about the login request, you can try the following code:
import urllib3
http = urllib3.PoolManager()
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = http.request('POST', url, data={'usr':'admin','pwd':'12345'}, encode_multipart=False, redirect=False)
print(req.data.decode('utf-8'))
The encode_multipart=False instructs urllib3 to send data with content-type application/x-www-form-urlencoded; the redirect=False tells it not to follow the redirect, so that you can see the response of your initial request.
If you do want to complete the whole login process, however, you need to save the cookie from the first response and send it in the second request. You can do it with urllib3, or
Use the Requests library
I'm not sure if you have any particular reasons to use urllib3. Urllib3 will definitely work if you implements it well, but I would suggest try the Request library, which is much easier to use. For you case, the following code with Request will work and get you to the welcome page:
import requests
url = 'http://testing-ground.scraping.pro/login?mode=login'
req = requests.post(url, data={'usr':'admin','pwd':'12345'})
print(req.text)
import requests
auth_credentials = ("admin", "12345")
url = "http://testing-ground.scraping.pro/login?mode=login"
response = requests.post(url=url, auth=auth_credentials)
print(response.text)
When I'm doing a GET request everything works fine but when I try a POST request, it returns 404.
I'm working on this API that interacts with Nagios :
https://github.com/EyesOfNetworkCommunity/eonapi
Here's my python GET resquest :
import requests
r = requests.get('https://device/eonapi/getAuthenticationStatus?username=test&apiKey=49fd4f56qs4dfs2sdf4')
print(r.json())
print(r.status_code)
And the result :
{'api_version': '2.4.2', 'http_code': '200 OK', 'status': 'authorized'}
200
The POST request when I'm trying to get information about a monitored host :
import requests
r = requests.post('https://device/eonapi/getHost?username=test&apiKey=49fd4f56qs4dfs2sdf4', data = {'hostName':'test1'})
print(r.status_code)
Result :
404
I don't know what I'm doing wrong, I tried these requests with PHP and cURL but I still get the same results.
You probably have to use a get request because the url says "getHost" and your trying to get data. Have you tried using get on that url?
I have noticed that for some websites' API Urls, the return on the browser is via a service worker which has caused problems in scraping those APIs.
For consider the following:
https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand
The data appears when the url is pasted into a browser However it gives me a 422 error when I try to automate the collection of that data in Python with the following code:
import requests
#API url
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
#The response is always 422
response = requests.get(url)
I have noticed that when calling the API url on the browser returns a response via a service worker. Therefore my questions is there a way around to get a 200 response via the python requests library?
The server appears to require the Accept-Language header.
The code below now returns 200.
import requests
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
headers = {'Accept-Language': 'en-gb'}
response = requests.get(url, headers=headers)
(Ascertained by checking a successful request via a browser, adding in all headers AS IS to the python request and then removing one by one.)
I am trying to submit a multipart POST request in Python. I looked around and found 2 variations:
Using 'reqests' (http://docs.python-requests.org/en/latest/)
Using urllib2 (https://docs.python.org/2/library/urllib2.html#module-urllib2)
I tried both of them and am able to submit the request successfully.
Below is the sample code for both:
----------requests--------------
resp = requests.post(submiturl, files=multipart_form_data, headers=headers,timeout=5)
where multipart_form_data contains my file object as well as string parameters
---------------urllib2------------
items.append(MultipartParam(name, value))
fileObj = open(inputFile,'r')
items.append(MultipartParam('file', filename=inputFile, fileobj=fileObj))
res = urllib2.urlopen(request)
My Question:
Which one should I use?
Correct me if I am wrong but I have seen that while submitting with urllib2 I get the HTTPError for response code like 500. However, while using "request" it does not throw the HTTPError for response code like 500s instead I have to manually add the condition:
Response.raise_for_status():
or:
resp.status_code != 200: raise Execption(...)
Is this correct or I am missing something?
Thanks!
Response.raise_for_status() raises for HTTP response code in the 4xx and 5xx ranges. The src is very clear and readable.
You'll get a 2xx response for successful requests, but you may also want to consider other response codes, for example redirects.
The task I want to complete is very simple. To do a http get request using python.
Below is the code I used:
url = 'http://www.costcobusinessdelivery.com/AjaxWarehouseBrowseLookupView?storeId=11301&catalogId=11701&langId=-1&parentGeoNode=10112'
requests.get(url)
Then I got:
<Response [401]>
I am new to python, can someone help? Thanks!
Update:
Based on the comments. It seems the code is okay, but I do get the 401 response. I doubt my company's network has some restrictions? But I can access and get a valid response through a browser. Is there a way to bypass my company's firewall/proxy or whatever? Just to pretend that I am using a browser in python? Thanks again!
If your browser is accessing the web via a proxy server, look that up on your browser settings and use that in python.
r = requests.get(url,
proxies={"http": "http://61.233.25.166:80"})
your proxy server will have a different address.