On multiple login pages, a google login is required in order to proceed. I would like to use requests library in python in order to log myself in. Normally this would be easy with the requests library, however I have not been able to get it to work. I am not sure if this is due to some restriction Google has made (perhaps I need to use their API?), or if it is because the Google login page requires the user to enter their email first, then press submit, and then enter their password, etc.
This problem has been asked before over here, but none of the solutions work for me. Currently I've been using code provided in this solution: Log into Google account using Python? as shown here:
from bs4 import BeautifulSoup
import requests
my_email = "email_placeholder#gmail.com" # my email is here
my_pass = "my_password" # my password is here
form_data={'Email': my_email, 'Passwd': my_pass}
post = "https://accounts.google.com/signin/challenge/sl/password"
with requests.Session() as s:
soup = BeautifulSoup(s.get("https://mail.google.com").text, "html.parser")
for inp in soup.select("#gaia_loginform input[name]"):
if inp["name"] not in form_data:
form_data[inp["name"]] = inp["value"]
s.post(post, form_data)
html = s.get("https://mail.google.com/mail/u/0/#inbox").content
print(my_email in s.get('https://mail.google.com/mail/u/0/#inbox').text) # Prints 'False', should print 'True'
As you can see the code at the end returns False. Furthermore, if I write the html to a file and open that in a browser, the page I get is the default Google login page indicating it has not worked.
Related
I want to create a program where I can check my grades using python and I have the code to web scrape data, but I do not know how to log into this specific website. The website is https://hac.chicousd.org/LoginParent.aspx?page=Default.aspx and if you need it I can give my username and password. I have tried using requests and urllib and neither work. I appreciate any help given.
Try using mechanical soup. It allows you to navigate a website just like you would normally.
As pointed out in the comments, a possibility is to use selenium, a browser manipulation tool. However, you can also use requests.Sessions to send a POST request with a payload of the email, and then a GET request for whatever portal page you wish to view after:
import requests
r = requests.Session()
payload = {'portalAccountUsername':'yoursutdentemail#school.com'}
r.post('https://hac.chicousd.org/LoginParent.aspx?page=Default.aspx', data = payload)
Then, with r instance, you can send a GET request to a page on the portal that is only visible to authenticated users:
data = r.get('https://hac.chicousd.org/some_student_only_page').text
Note that the keys of the payload dictionary must all be valid <input> "name" values from the site's HTML.
As others have said, you can use selenium. You also should use time to stop the program some seconds before to put your password. First install selenium in you command prompt pip install selenuim and a webdriver (here is the code for chrome pip install chromedriver_installer). Then you could use them in your code.
import selenium
from selenium import webdriver
import time
from time import sleep
Then, you should open the web page with the web driver
browser = webdriver.Chrome('C:\\Users...\\chromedriver.exe')
browser.get('The website address')
The next step is to find the name of the elements on the web page to write your username, password, and the path for the buttons
username = browser.find_element_by_id('portalAccountUsername')
username.send_keys('your email')
next = browser.find_element_by_xpath('//*[#id="next"]')
next.click()
password = browser.find_element_by_id('portalAccountPassword')
time.sleep(2)
password.send_keys('your password')
sing_in = browser.find_element_by_xpath('//*[#id="LoginButton"]')
sing_in.click()
I am trying to login LinkedIn using python request session module but iam not able access other pages please help me out.
My code is like this
import requests
from bs4 import BeautifulSoup
# Get login form
URL = 'https://www.linkedin.com/uas/login'
session = requests.session()
login_response = session.get('https://www.linkedin.com/uas/login')
login = BeautifulSoup(login_response.text,"lxml")
# Get hidden form inputs
inputs = login.find('form', {'name': 'login'}).findAll('input',
{'type':
['hidden', 'submit']})
# Create POST data
post = {input.get('name'): input.get('value') for input in inputs}
post['session_key'] = 'usename'
post['session_password'] = 'password'
# Post login
post_response = session.post('https://www.linkedin.com/uas/login-
submit', data=post)
notify_response = session.get('https://www.linkedin.com/company-
beta/3067/')
notify = BeautifulSoup(notify_response.text,"lxml")
print notify.title
Well, hope I'm not saying wrong stuff, but I had to crawl linkedin some weeks ago and seen linkedin is pretty good at spoting bots. I'm almost sure it is your issue here (you should try to print output of post_response, you surelly you will see you are on a captcha page or something like that).
Plot twist: I succeed to login into linkedin by running selenium, login to linkedin by hand and use pickle to save cookies as text file.
Then, instead of using login form, I just loaded cookies to selenium and refresh page, tadam, logged in. I think this can be done with requests
I am attempting to scrape a website using the following code
import re
import requests
def get_csrf(page):
matchme = r'name="csrfToken" value="(.*)" /'
csrf = re.search(matchme, str(page))
csrf = csrf.group(1)
return csrf
def login():
login_url = 'https://www.edline.net/InterstitialLogin.page'
with requests.Session() as s:
login_page = s.get(login_url)
csrf = get_csrf(login_page.text)
username = 'USER'
password = 'PASS'
login = {'screenName': username,
'kclq': password,
'csrfToken': csrf,
'TCNK':'authenticationEntryComponent',
'submitEvent':'1',
'enterClicked':'true',
'ajaxSupported':'yes'}
page = s.post(login_url, data=login)
r = s.get("https://www.edline.net/UserDocList.page?")
print(r.text)
login()
Where I log into https://www.edline.net/InterstitialLogin.page, which is successful, but the problem I have is when I try to do
r = s.get("https://www.edline.net/UserDocList.page?")
print(r.text)
It doesn't print the expected page, instead it throws an error. Upon further testing I discovered that it throws this error even if you try to go directly to the page from a browser. So when I investigated the page source I found that the button used to link to the page I'm trying to scrape uses the following code
Private Reports
So essentially I am looking for a way to trigger the above javascript code in python in order to scrape the resulting page.
It is impossible to answer this question without having more context than this single link.
However, the first thing you want to check, in the case of javaScript driven content generation, are the requests made by your web page when clicking on that link.
To do this, take a look at the network-panel in the console of your browser. Record the requests being made, look especially for XHR-requests. Then, you can try to replicate this e.g. with the requests library.
content = requests.get('xhr-url')
The goal here is to be able to post username and password information to https://canvas.instructure.com/login so I can access and scrape information from a page once logged in.
I know the login information and the name of the login and password (pseudonym_session[user_id], and pseudonym_sessionp[password]) but I'm not sure how to use the requests.Session() to pass the login page.
import requests
s = requests.Session()
payload = {'pseudonym_session[user_id]': 'bond', 'pseudonym_session[password]': 'james bond'}
r = s.post('https://canvas.instructure.com/login', data=payload)
r = s.get('https://canvas.instructure.com/(The page I want)')
print(r.content)
Thanks for your time!
Actually the code posted works fine. I had a spelling error on my end with the password. Now I'm just using beautiful soup to find what I need on the page after logging in.
Put Chrome (or your browser of choice) into debug mode (Tools-> Developer Tools-> Network in Chrome) and do a manual login. Then follow closely what happens and replicate it in your code. I believe that is the only way, unless the website has a documented api.
I am trying to use Python 2.7.6 to login a website. the login logic contains 2 steps in 2 webpages.
Putting in user ID and password onto page A, and the page A gives a cookie;
This cookie is used in the header to authenticate the login on page B.
It only logs in successfully once B authenticated it.
There’s a post here, HTTP POST and GET with cookies for authentication in python, asking the similar question. A solution is using requests.
import requests
url_0 = "http://www.PAGE_A.com/" # http://webapp.pucrs.br/consulta/principal.jsp, in original example
url = "http://www.PAGE_B.com/" # https://webapp.pucrs.br/consulta/servlet/consulta.aluno.ValidaAluno, in original example
data = {"field1_username": "ABC", " field_password": "123"}
s = requests.session()
s.get(url_0)
r = s.post(url, data)
I tired used this in Python for my case and it doesn't return error message so I guess it works fine.
But the question is, how do I know it’s logged in?
I added below to print the logged in page to see if it returned the right page.
import mechanize
br = mechanize.Browser()
open_page = br.open("http://www.PAGE_B.com/")
read_page = open_page.read()
print read_page
However, it stills shows the contents before login. What went wrong?
How about just going with one of the two?
import mechanize;
browser = mechanize.Browser()
browser.addheaders = [('...')]
browser.open(YOUR_URL)
browser.select_form(FORM_NAME)
browser.form['USERNAME_FIELD'] = 'abc'
browser.form['PASSWORD_FIELD'] = 'password'
browser.submit()
print browser.response().read()
print browser.geturl()