I am making a POST call to a script at the following URL (internal to my company so can't be accessed from outside):
https://opsdata.mycompany.com/scripts/finance/finance.exe
The initial site is a html page that has text boxes for you to enter data into, and it has a post action to the above url. However, it redirects to a login page which is also at the above url that has text boxes for a username and password. I submit data to the login page using the following code:
post_url_finance = 'https://opsdata.*****.com/scripts/finance/finance.exe'
s = requests.session()
s.auth = {'USER_NAME': '*****', 'PASSWORD': '*****'}
proxies = {'http': 'http://proxy-***.****.com'}
To do the authentication, I am using:
pageCert = requests.post(post_url_finance, proxies=proxies, verify=False)
This gives me a response:
<Response [200]>
C:\Python27\lib\site-packages\urllib3\connectionpool.py:768: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
However, I need to send the data which I am querying for using this info:
values_finance = {'EMPLOYEE_TOTAL': '-----'}
when I make the post call a second time using:
page = requests.post(post_url_finance, data=values_finance, proxies=proxies, verify=False)
I am getting the same response back.
<Response [200]>
How do I make the second call to Post retrieve the data I want?
all status_code=200 means is that the website "successfully rendered a page" alot of times sites may not make their invalid login pages or error pages return anything else
you need to look at pageCert.content .... I dont think you are actually logging in (maybe you are) ... on your second call you need to do
page = s.post(url,...)
to get api data you *probably want to use json
page_data = s.post(url,...).json()
First time I see this thing but docs gave me some guidance... Looks like you're just printing out the response but not the data, have an example:
r = requests.get('https://www.google.com')
print(r)
# <response 200>
# Now if I write the text:
print(r.text)
# A lot of html comes out
As #Joran Beasley says, you may need to just use print(r.json) to see what you need. In the ideal scenario, getting a 200 response code is a good thing, otherwise you would be getting a 401/403 error if the authentication failed.
The exception has more to do with the authenticity of the certs on the mycompany.com end for what you can read in the urlllib3 docs.
Related
My company changed the URL for the company wiki. Now, when I use the new URL authentication is broken in python requests, where it was working before the change.
Under the new URL, I can only authenticate to the front page:
auth = ('myser', 'mypass')
url = 'https://msvcs.us.cworld.company.com/eailrr/User/Logon'
page = requests.get(url, auth=auth)
page
<Response [200]>
However, when I try to GET for the page I actually need I get a 401 error:
url = 'https://msvcs.us.cworld.company.com/wiki/rest/api/content'
page = requests.get(url, auth=auth)
page.raise_for_status
<bound method Response.raise_for_status of <Response [401]>>
Please note: I am unable to post actual company URLs in a public place. They are not accessible off the network anyway.
I am using the same exact auth for both requests. Why do I get a 200 for the first request, but a 401 for the second request?
Site url is http://rajresults.nic.in/resbserx18.htm when send data, but when response comes URL changes in ASP. So which URL user need to send request ASP or html?
Request:
import requests
# data for get result
>>> para = {'roll_no':'2000000','B1':'Submit'}
# this is url where data is entered and get asp response
>>> url = 'http://rajresults.nic.in/resbserx18.htm'
>>> result = requests.post(url,data=para)
>>> result.text
Response
'The page you are looking for cannot be displayed because an invalid method (HTTP verb) is being used.'
Okay after a little bit of work, I found it's some issue with the headers.
I did some trial and error, and found that it checks to make sure the Host header is set.
To debug this, I just incrementally removed chrome's request headers and found which one this web service was particular about.
import requests
headers = {
"Host": "rajresults.nic.in"
}
r = requests.post('http://rajresults.nic.in/resbserx18.asp',
headers = headers,
data = {'roll_no': 2000000, 'B1': 'Submit'}
)
print(r.text)
I've used requests with good results but with this particular url, I get a redirects loop break.
s = requests.Session()
page = s.get('http://pe.usps.gov/text/pub28/28apc_002.htm')
tree = html.fromstring(page.content)
street_type = tree.xpath(r"//*[#id='ep533076']/tbody/tr[2]/td[1]/p/a")
print(street_type)
I'm wondering specifically if there is a way to assign headers for the request so as to avoid the redirect. I've tested the actual url and it looks valid.
Thanks
The redirect is response sent by the server. It is typically a HTTP <301> or <302> response, which says "hey, I know what you are looking for, it is over here..." and sends you a new place to look. Yes, these can be chained together, and yes, you can end up in loops. That is what the max redirect limit is for.
You can set the number of allowable redirects in requests using:
s.max_redirects = 50 # the default is 30
But this will not solve the issue. In this particular case the server is looking for what kind of browser you are using and is redirecting you when it doesn't find what it is looking for. You can imitate a browser by adding a user-agent field to the header.
Recommended usage: sets the header to a generic browser for the single request
session.get(url, headers={'user-agent': 'My app'})
# returns:
<Response [200]>
Original posting: sets the header for the entire session, which is not necessarily what you want.
s.headers = {'user-agent': 'some app'}
s.get('http://pe.usps.gov/text/pub28/28apc_002.htm')
# returns:
<Response [200]>
I am trying to log in with a post request using the python requests module on a MediaWiki page:
import requests
s = requests.Session()
s.auth = ('....', '....')
url = '.....'
values = {'wpName' : '....',
'wpPassword' : '.....'}
req = s.post(url, values)
print(req.content)
I can't tell from the return value of the post request whether the login attempt was succesful. Is there something I can do to check this? Thanks.
Under normal circumstances i would advise you to go the mechanize way and make things way too easy for yourself but since you insist on requests, then let us use that.
YOu obviously have got the values right but i personally don't use the auth() function. So, try this instead.
import requests
url = 'https://example.com/wiki/index.php?title=Special:UserLogin'
values = {
'wpName': 'myc00lusername',
'wpPassword': 'Myl33tPassw0rd12'
}
session = requests.session()
r = session.post(url, data=values)
print r.cookies
This is what I used to solve this.
After getting a successful login, I read the texts from
response.text
and compared it to the text I got when submitting incorrect information.
The reason I did this is that validation is done on the server side and Requests will get a 200 OK response whether it was successful or not.
So I ended up adding this line.
logged_in = True if("Incorrect Email or password" in session.text) else False
Typically such an authentication mechanism is implemented using HTTP cookies. You might be able to check for the existence of a session cookie after you've authenticated successfully. You find the cookie in the HTTP response header or the sessions cookie attribute s.cookies.
My intention is to log into a site and then access a protected image from a python script. I have both legal and working access from a browser.
This is what I have now.
import requests
s = requests.Session()
s.get('*domain*')
r_login =s.post('*domain*/user.php', headers={'cmd': 'login', 'loginname': '***', 'password': '***' })
print (s.cookies)
print (r_login.status_code)
r_img = s.get('*domain*/*protectedimage*.jpg')
print (r_img.status_code)
print (r.cookies)
print (s.cookies['PHPSESSID'])
Output:
<<class 'requests.cookies.RequestsCookieJar'>[<Cookie PHPSESSID=664b0842085b847a04d415a22e013ad8 for *domain*/>]>
200
403
<<class 'requests.cookies.RequestsCookieJar'>[]>
664b0842085b847a04d415a22e013ad8
I am sure I can successfully log in, because I have once downloaded the html file after doing so, and it was in a form of being logged in. But my problem is that it seems to me that my PHPSESSID cookie does not pass so I get a 403 error back. But I clearly have it in my session. I have also tried adding the cookie manually to my "r_img" line, and it made no difference, I still get an empty CookieJar and a 403 error back. Would this be not possible with only the requests modul? Did I overlook something? Excuse me for being not quite familiar with HTTP requests.
I'm using Python 3.4 just for sake of clarity.
You are passing in your form data as HTTP headers. A POST login form should send form elements as the data parameter instead:
r_login = s.post('*domain*/user.php',
data={'cmd': 'login', 'loginname': '***', 'password': '***' })
Do inspect the returned body, not just the status code. Your POST request was accepted by the server (200 OK) but since no login information was posted, the body will most likely tell you something like "login incorrect, please try again".
The server most likely cleared the cookie again seeing as it was not a valid login session when you requested the image. The 403 response probably contains a Set-Cookie header for PHPSESSID with a date in the past to clear it.
Try doing it like this:
As per python-requests docs:
payload = {'cmd': 'login', 'loginname': '***', 'password': '***'}
url = '*domain*/user.php'
s.post(url, data=payload)