Can't authenticate via python requests - python

My company changed the URL for the company wiki. Now, when I use the new URL authentication is broken in python requests, where it was working before the change.
Under the new URL, I can only authenticate to the front page:
auth = ('myser', 'mypass')
url = 'https://msvcs.us.cworld.company.com/eailrr/User/Logon'
page = requests.get(url, auth=auth)
page
<Response [200]>
However, when I try to GET for the page I actually need I get a 401 error:
url = 'https://msvcs.us.cworld.company.com/wiki/rest/api/content'
page = requests.get(url, auth=auth)
page.raise_for_status
<bound method Response.raise_for_status of <Response [401]>>
Please note: I am unable to post actual company URLs in a public place. They are not accessible off the network anyway.
I am using the same exact auth for both requests. Why do I get a 200 for the first request, but a 401 for the second request?

Related

Python get requests for an API URL returns 422 error but on browser no problems. Potential service worker problem?

I have noticed that for some websites' API Urls, the return on the browser is via a service worker which has caused problems in scraping those APIs.
For consider the following:
https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand
The data appears when the url is pasted into a browser However it gives me a 422 error when I try to automate the collection of that data in Python with the following code:
import requests
#API url
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
#The response is always 422
response = requests.get(url)
I have noticed that when calling the API url on the browser returns a response via a service worker. Therefore my questions is there a way around to get a 200 response via the python requests library?
The server appears to require the Accept-Language header.
The code below now returns 200.
import requests
url = 'https://www.sephora.co.id/api/v2.3/products?filter[category]=makeup/face/bronzer&page[size]=30&page[number]=1&sort=sales&include=variants,brand'
headers = {'Accept-Language': 'en-gb'}
response = requests.get(url, headers=headers)
(Ascertained by checking a successful request via a browser, adding in all headers AS IS to the python request and then removing one by one.)

I want to send a Python request to an ASP site but the site show access denied

Site url is http://rajresults.nic.in/resbserx18.htm when send data, but when response comes URL changes in ASP. So which URL user need to send request ASP or html?
Request:
import requests
# data for get result
>>> para = {'roll_no':'2000000','B1':'Submit'}
# this is url where data is entered and get asp response
>>> url = 'http://rajresults.nic.in/resbserx18.htm'
>>> result = requests.post(url,data=para)
>>> result.text
Response
'The page you are looking for cannot be displayed because an invalid method (HTTP verb) is being used.'
Okay after a little bit of work, I found it's some issue with the headers.
I did some trial and error, and found that it checks to make sure the Host header is set.
To debug this, I just incrementally removed chrome's request headers and found which one this web service was particular about.
import requests
headers = {
"Host": "rajresults.nic.in"
}
r = requests.post('http://rajresults.nic.in/resbserx18.asp',
headers = headers,
data = {'roll_no': 2000000, 'B1': 'Submit'}
)
print(r.text)

POST call with authentication to same URL, using Python Requests

I am making a POST call to a script at the following URL (internal to my company so can't be accessed from outside):
https://opsdata.mycompany.com/scripts/finance/finance.exe
The initial site is a html page that has text boxes for you to enter data into, and it has a post action to the above url. However, it redirects to a login page which is also at the above url that has text boxes for a username and password. I submit data to the login page using the following code:
post_url_finance = 'https://opsdata.*****.com/scripts/finance/finance.exe'
s = requests.session()
s.auth = {'USER_NAME': '*****', 'PASSWORD': '*****'}
proxies = {'http': 'http://proxy-***.****.com'}
To do the authentication, I am using:
pageCert = requests.post(post_url_finance, proxies=proxies, verify=False)
This gives me a response:
<Response [200]>
C:\Python27\lib\site-packages\urllib3\connectionpool.py:768: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
However, I need to send the data which I am querying for using this info:
values_finance = {'EMPLOYEE_TOTAL': '-----'}
when I make the post call a second time using:
page = requests.post(post_url_finance, data=values_finance, proxies=proxies, verify=False)
I am getting the same response back.
<Response [200]>
How do I make the second call to Post retrieve the data I want?
all status_code=200 means is that the website "successfully rendered a page" alot of times sites may not make their invalid login pages or error pages return anything else
you need to look at pageCert.content .... I dont think you are actually logging in (maybe you are) ... on your second call you need to do
page = s.post(url,...)
to get api data you *probably want to use json
page_data = s.post(url,...).json()
First time I see this thing but docs gave me some guidance... Looks like you're just printing out the response but not the data, have an example:
r = requests.get('https://www.google.com')
print(r)
# <response 200>
# Now if I write the text:
print(r.text)
# A lot of html comes out
As #Joran Beasley says, you may need to just use print(r.json) to see what you need. In the ideal scenario, getting a 200 response code is a good thing, otherwise you would be getting a 401/403 error if the authentication failed.
The exception has more to do with the authenticity of the certs on the mycompany.com end for what you can read in the urlllib3 docs.

Retrieve OAuth code in redirect URL provided as POST response

Python newbie here, so I'm sure this is a trivial challenge...
Using Requests module to make a POST request to the Instagram API in order to obtain a code which is used later in the OAuth process to get an access token. The code is usually accessed on the client-side as it's provided at the end of the redirect URL.
I have tried using Request's response history method, like this (client ID is altered for this post):
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
OAuth_AccessRequest = requests.post(OAuthURL)
ResHistory = OAuth_AccessRequest.history
for resp in ResHistory:
print resp.status_code, resp.url
print OAuth_AccessRequest.status_code, OAuth_AccessRequest.url
But the URLs this returns are not revealing the code number string, instead, the redirect just looks like this:
302 https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.dashboard.com/whathappened&response_type=code
200 https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%cb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode
Where if you do this on the client side, using a browser, code would be replaced with the actual number string.
Is there a method or approach I can add to the POST request that will allow me to have access to the actual redirect URL string that appears in the web browser?
It should work in a browser if you are already logged in at Instagram. If you are not logged in you are redirected to a login page:
https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%3Dcb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode
Your Python client is not logged in and so it is also redirected to Instagram's login page as shown by the value of OAuth_AccessRequest.url :
>>> import requests
>>> OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
>>> OAuth_AccessRequest = requests.get(OAuthURL)
>>> OAuth_AccessRequest
<Response [200]>
>>> OAuth_AccessRequest.url
u'https://instagram.com/accounts/login/?force_classic_login=&next=/oauth/authorize/%3Fclient_id%3Dcb0096f08a3848e67355f%26redirect_uri%3Dhttps%3A//www.smashboarddashboard.com/whathappened%26response_type%3Dcode'
So, to get to the next step, your Python client needs to login. This requires that the client extract and set fields to be posted back to the same URL. It also requires cookies and that the Referer header be properly set. There is a hidden CSRF token that must be extracted from the page (you could use BeautifulSoup for example), and form fields username and password must be set. So you would do something like this:
import requests
from bs4 import BeautifulSoup
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
session = requests.session() # use session to handle cookies
OAuth_AccessRequest = session.get(OAuthURL)
soup = BeautifulSoup(OAuth_AccessRequest.content)
form = soup.form
login_data = {form.input.attrs['name'] : form.input['value']}
login_data.update({'username': 'your username', 'password': 'your password'})
headers = {'Referer': OAuth_AccessRequest.url}
login_url = 'https://instagram.com{}'.format(form.attrs['action'])
r = session.post(login_url, data=login_data, headers=headers)
>>> r
<Response [400]>
>>> r.json()
{u'error_type': u'OAuthException', u'code': 400, u'error_message': u'Invalid Client ID'}
Which looks like it will work once provided a valid client ID.
As an alternative, you could look at mechanize which will handle the form submission for you, including the hidden CSRF field:
import mechanize
OAuthURL = "https://api.instagram.com/oauth/authorize/?client_id=cb0096f08a3848e67355f&redirect_uri=https://www.smashboarddashboard.com/whathappened&response_type=code"
br = mechanize.Browser()
br.open(OAuthURL)
br.select_form(nr=0)
br.form['username'] = 'your username'
br.form['password'] = 'your password'
r = br.submit()
response = r.read()
But this doesn't work because the referer header is not being set, however, you could use this method if you can figure out a solution to that.

Fetch a page with cookies using Python requests library

I'm just studying the requests library(http://docs.python-requests.org/en/latest/),
and got a problem on how to fetch a page with cookies using requests.
for example:
url2= 'https://passport.baidu.com'
parsedCookies={'PTOKEN': '412f...', 'BDUSS': 'hnN2...', ...} #Sorry that the cookies value is replaced by ... for instance of privacy
req = requests.get(url2, cookies=parsedCookies)
text=req.text.encode('utf-8','ignore')
f=open('before.html','w')
f.write(text)
f.close()
req.close()
when I use the codes above to fetch the page, it just saves the login page to 'before.html' instead of logined page, it refers that actually I haven't logged in successfully.
But if I use URLlib2 to fetch the page, it works properly as expected.
parsedCookies="PTOKEN=412f...;BDUSS=hnN2...;..." #Different format but same content with the aboved cookies
req = urllib2.Request(url2)
req.add_header('Cookie', parsedCookies)
ret = urllib2.urlopen(req)
f=open('before_urllib2.html','w')
f.write(ret.read())
f.close()
ret.close()
When I use these codes, it saves the logined page in before_urllib2.html.
--
Are there any mistakes in my code?
Any reply would be grateful.
You can use Session object to get what you desire:
url2='http://passport.baidu.com'
session = requests.Session() # create a Session object
cookie = requests.utils.cookiejar_from_dict(parsedCookies)
session.cookies.update(cookie) # set the cookies of the Session object
req = session.get(url2, headers=headers,allow_redirects=True)
If you use the requests.get function, it doesn't send cookies for the redirected page. Instead, if you use the Session().get function, it will maintain and send cookies for all http requests, this is what the concept "session" exactly means.
Let me try to elaborate to you what happens here:
When I sent cookies to http://passport.baidu.com/center and set the parameter allow_redirects as false, the returned status code is 302 and one of the headers of the response is 'location': '/center?_t=1380462657' (This is a dynamic value generated by server, you can replace it with what you get from server):
url2= 'http://passport.baidu.com/center'
req = requests.get(url2, cookies=parsedCookies, allow_redirects=False)
print req.status_code # output 302
print req.headers
But when I set the parameter allow_redirects as True, it still doesn't redirect to the page (http://passport.baidu.com/center?_t=1380462657) and the server return the login page. The reason is that the requests.get doesn't send cookies for the redirected page, here is http://passport.baidu.com/center?_t=1380462657, so we can login successfully. That is why we need the Session object.
If I set url2 = http://passport.baidu.com/center?_t=1380462657, it will return the page you want. One solution is use the above code to get the dynamic location value and form a path to you account like http://passport.baidu.com/center?_t=1380462657 , then you can get the desired page.
url2= 'http://passport.baidu.com' + req.headers.get('location')
req = session.get(url2, cookies=parsedCookies, allow_redirects=True )
But this is cumbersome, so when dealing with cookies, Session object do excellent job for us!

Categories