Python SSL post using requests - python

The goal here is to be able to post username and password information to https://canvas.instructure.com/login so I can access and scrape information from a page once logged in.
I know the login information and the name of the login and password (pseudonym_session[user_id], and pseudonym_sessionp[password]) but I'm not sure how to use the requests.Session() to pass the login page.
import requests
s = requests.Session()
payload = {'pseudonym_session[user_id]': 'bond', 'pseudonym_session[password]': 'james bond'}
r = s.post('https://canvas.instructure.com/login', data=payload)
r = s.get('https://canvas.instructure.com/(The page I want)')
print(r.content)
Thanks for your time!

Actually the code posted works fine. I had a spelling error on my end with the password. Now I'm just using beautiful soup to find what I need on the page after logging in.

Put Chrome (or your browser of choice) into debug mode (Tools-> Developer Tools-> Network in Chrome) and do a manual login. Then follow closely what happens and replicate it in your code. I believe that is the only way, unless the website has a documented api.

Related

Log in to website using Python and Requests module?

I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.
The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.
My code after trying multiple solutions to no avail:
import requests
LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
"loginId": "myemail#email.com",
"password": "mypassword"
}
with requests.Session() as sess:
#go to login page
sess.get(LOGIN_URL)
#attempt to log in with my login info
sess.post(LOGIN_URL, data=LOGIN_INFO)
#go to 'My AliExpress' page to verify successful login
success = sess.get("https://home.aliexpress.com/index.htm")
#manually check html to see if I was sent to the login page again
print(success.text)
This is pretty much what's left after my many failed attempts. Some of the things I've tried are:
Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns
this but I don't know what to do with it (in key:value format):
ali_apache_tracktmp :
ali_apache_track :
xman_f :
t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
JSESSIONID : 30678741D7473C80BEB85825718FB1C6
acs_usuc_t :
acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
xman_us_f : x_l=0
ali_apache_id : 23.76.146.14.1510893827939.187695.4
xman_t :
PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.
Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).
Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.
Googled for a few hours but didn't find anything that would work.
At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.
I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.
I have a suspicion this will not work the way you want it to.
Even after somehow accomplishing the login prompt, the following page presents a "slider verification" which to my knowledge requests is unable to do anything about. (If there is a method please let me know).
I have been trying to use cookies instead:
session = requests.Session()
cj = requests.cookies.RequestsCookieJar()
cj.set('KEY', 'VALUE')
session.cookies = cj
response = session.get(url, timeout=5, headers=headers, proxies=proxies)
Previously the scraper worked using headers and proxies for a time, but recently it always prompts a login.
I have tried all the keys and values in the cookies as well to no avail.
An idea would be to use selenium to login and capture cookies, then pass it to requests session.
AntoG has a solution to do this:
https://stackoverflow.com/a/42114843

Weird result with Requests library to do web page login

I'm trying to use the session object in Requests library to make POST requests with login credential by following a practice example from the book "Web Scraping with Python". However, I have encountered a weird problem (probably a bug) with it. The author of the book had set up a simple login web page for test:
http://pythonscraping.com/pages/cookies/login.html.
The username can be anything with the password as "password", then the login process will be redirected to another page:
http://pythonscraping.com/pages/cookies/welcome.php.
The Python code that I use to run against it is as follows:
import requests
ssn = requests.Session()
param = {'password':'password','username':'letmein'}
r = ssn.post('http://pythonscraping.com/pages/cookies/welcome.php',data=param)
print('Cookies set to :')
print(r.cookies.get_dict())
print('Server response :')
print(r.text)
And the output of the above code is:
Cookies set to :
{'loggedin': '1', 'username': 'letmein'}
Server response :
<h2>Welcome to the Website!</h2>
Whoops! You logged in wrong. Try again with any username, and the password "password" <br>Log in here
Apparently the login has failed. The thing that I don't understand is why. And in the program output showed the POST variables set by the request, the password variable is missing. Am I missing something here or is it a bug with the Requests library?
P.S. I'm running the program with Python 3.6.2 and Requests 2.18.4.
I tried logging in using browser. Apparently the password is "password" not password . Starting and ending quotes is part of password.
I think #PRMoureu is wrong in suggesting that url should be of login. Actually url should be http://pythonscraping.com/pages/cookies/welcome.php only. Because actual login functionality is implemented on this page.

How to make HTTP POST on website that uses asp.net?

I'm using Python library requests for this, but I can't seem to be able to log in to this website.
The url is https://www.bet365affiliates.com/ui/pages/affiliates/, and I've been trying post requests to https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1 with the data of "ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox", "ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox", etc, but I never seem to be able to get logged in.
Could someone more experienced check the page's source code and tell me what am I am missing here?
The solution could be this: Please Take attention, you could do it without selenium. If you want to do without it, firstly you should get the main affiliate page, and from the response data you could fetch all the required information (which I gather by xpaths). I just didn't have enough time to write it in fully requests.
To gather the informations from response data you could use XML tree library. With the same XPATH method, you could easily find all the requested informations.
import requests
from selenium import webdriver
Password = 'YOURPASS'
Username = 'YOURUSERNAME'
browser = webdriver.Chrome(os.getcwd()+"/"+"Chromedriver.exe")
browser.get('https://www.bet365affiliates.com/ui/pages/affiliates/Affiliates.aspx')
VIEWSTATE=browser.find_element_by_xpath('//*[#id="__VIEWSTATE"]')
SESSIONID=browser.find_element_by_xpath('//*[#id="CMSessionId"]')
PREVPAG=browser.find_element_by_xpath('//*[#id="__PREVIOUSPAGE"]')
EVENTVALIDATION=browser.find_element_by_xpath('//* [#id="__EVENTVALIDATION"]')
cookies = browser.get_cookies()
session = requests.session()
for cookie in cookies:
print cookie['name']
print cookie['value']
session.cookies.set(cookie['name'], cookie['value'])
payload = {'ctl00_AjaxScriptManager_HiddenField':'',
'__EVENTTARGET':'ctl00$MasterHeaderPlaceHolder$ctl00$goButton',
'__EVENTARGUMENT':'',
'__VIEWSTATE':VIEWSTATE,
'__PREVIOUSPAGE':PREVPAG,
'__EVENTVALIDATION':EVENTVALIDATION,
'txtPassword':Username,
'txtUserName':Password,
'CMSessionId':SESSIONID,
'returnURL':'/ui/pages/affiliates/Affiliates.aspx',
'ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox':Username,
'ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox':Password,
'ctl00$MasterHeaderPlaceHolder$ctl00$tempPasswordTextbox':'Password'}
session.post('https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1',data=payload)
Did you inspected the http request used by the browser to log you in?
You should replicate it.
FB

Retrieve awesomebox.io scan page content with python-requests

i'm trying to retrieve the page content from https://www.awesomebox.io/scan
But before I can do that need to be logged in. At the moment I still get the login page content. Thats because it redirects because im not logged in.
Anybody know how to get the scan page content with python-requests?
I tried multiple requests authentication methods.
My code so far:
import requests
session = requests.session()
loginURL = 'http://www.awesomebox.io/login'
payload = {'username': '******','password': '******'}
session.post(loginURL, data=payload)
scanURL = "http://awesomebox.io/scan"
scanpage = session.get(scanURL)
print scanpage.content
I don't have an account with awesomebox, so therefore don't know exactly. But nowadays a login on websites is more sophisticated and secure than a simple post of username and password.
To find out, you can do a manual login and trace the web traffic in the developer mode of the browser (e.g. F12 for MSIE or Edge) and store it in a .har file. There you can (hopefully) see, how the Login procedure is implemented and build the same sequence in your requests session.
Sometimes there is a hidden field in the form (e.g. "lt" for login ticket) that has been populated via js by the page before. Sometimes it's even more complex, if a secret login in run via Ajax in the Background. In this case you even see nothing in the F12 view and have to dig into the js scripts.
Thank you, I noticed i forgot a hidden parameter.
I added the csrfmiddlewaretoken.

Python script is scraping the wrong page source. I think it's failing to login properly?

This script succeeds at getting a 200 response object, getting a cookie, and returning reddit's stock homepage source. However, it is supposed to get the source of the "recent activity" subpage which can only be accessed after logging in. This makes me think it's failing to log in appropriately but the username and password are accurate, I've double checked that.
#!/usr/bin/python
import requests
import urllib2
auth = ('username', 'password')
with requests.session(auth=auth) as s:
c = s.get('http://www.reddit.com')
cookies = c.cookies
for k, v in cookies.items():
opener = urllib2.build_opener()
opener.addheaders.append(('cookie', '{}={}'.format(k, v)))
f = opener.open('http://www.reddit.com/account-activity')
print f.read()
It looks like you're using the standard "HTTP Basic" authentication, which is not what Reddit uses to log in to its web site. (Almost no web sites use HTTP Basic (which pops up a modal dialog box requesting authentication), but implement their own username/password form).
What you'll need to do is get the home page, read the login form fields, fill in the user name and password, POST the response back to the web site, get the resulting cookie, then use the cookie in future requests. There may be quite a number of other details for you to work out too, but you'll have to experiment.
I just think maybe we're having the same problem. I get status code 200 ok. But the script never logged me in. I'm getting some suggestions and help. Hopefully you'll let me know what works for you too. Seems reddit is using the same system too.
Check out this page where my problem is being discussed.
Authentication issue using requests on aspx site

Categories