I would like to integrate python Selenium and Requests modules to authenticate on a website.
I am using the following code:
import requests
from selenium import webdriver
driver = webdriver.Firefox()
url = "some_url" #a redirect to a login page occurs
driver.get(url) #the login page is displayed
#making a persistent connection to authenticate
params = {'os_username':'username', 'os_password':'password'}
s = requests.Session()
resp = s.post(url, params) #I get a 200 status_code
#passing the cookies to the driver
driver.add_cookie(s.cookies.get_dict())
The problem is that when I enter the browser the login authentication is still there when I try to access the url even though I passed the cookies generated from the requests session.
How can I modify the code above to get through the authentication web page?
I finally found out what the problem was.
Before making the post request with the requests library, I should have passed the cookies of the browser first.
The code is as follows:
import requests
from selenium import webdriver
driver = webdriver.Firefox()
url = "some_url" #a redirect to a login page occurs
driver.get(url)
#storing the cookies generated by the browser
request_cookies_browser = driver.get_cookies()
#making a persistent connection using the requests library
params = {'os_username':'username', 'os_password':'password'}
s = requests.Session()
#passing the cookies generated from the browser to the session
c = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser]
resp = s.post(url, params) #I get a 200 status_code
#passing the cookie of the response to the browser
dict_resp_cookies = resp.cookies.get_dict()
response_cookies_browser = [{'name':name, 'value':value} for name, value in dict_resp_cookies.items()]
c = [driver.add_cookie(c) for c in response_cookies_browser]
#the browser now contains the cookies generated from the authentication
driver.get(url)
I had some issues with this code because its set double cookies to the original browser cookie (before login) then I solve this with cleaning the cookies before set the login cookie to original. I used this command:
driver.delete_all_cookies()
Related
I've to login into a site (for exemple I will use facebook.com). I can manage the login process using selenium, but I need to do it with a POST. I've tried to use requests but I'm not able to pass the info needed to the selenium webdriver in order to enter in the site as logged user. I've found on-line that exists a library that integrates selenium and requests https://pypi.org/project/selenium-requests/ , but the problem is that there is no documentation and I'm blocked in the same story.
With selenium-requests
webdriver = Chrome()
url = "https://www.facebook.com"
webdriver.get(url)
params = {
'email': 'my_email',
'pass': 'my_password'
}
resp = webdriver.request('POST','https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params)
webdriver.get(url)
# I hoped that the new page open was the one with me logged in but it did not works
With Selenium and requests passing the cookies
driver = webdriver.Chrome()
webdriver = Chrome()
url = "https://www.facebook.com"
driver.get(url)
#storing the cookies generated by the browser
request_cookies_browser = driver.get_cookies()
#making a persistent connection using the requests library
params = {
'email': 'my_email',
'pass': 'my_password'
}
s = requests.Session()
#passing the cookies generated from the browser to the session
c = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser]
resp = s.post('https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params) #I get a 200 status_code
#passing the cookie of the response to the browser
dict_resp_cookies = resp.cookies.get_dict()
response_cookies_browser = [{'name':name, 'value':value} for name, value in dict_resp_cookies.items()]
c = [driver.add_cookie(c) for c in response_cookies_browser]
driver.get(url)
In both the cases if in the end I print the cookies seems that something as changed from the beginning, but the page remains the one with the login form.
This is the codes I've tried, I put both the attempts but it is sufficient to find the solution to one of these two.
Someone can help me and know what I've to do or to change to open the page with me logged in?
Thank you in advance!
I have the same problem.
In your code, you just pass the params as is.
In this example the code would be data=params in :
resp = webdriver.request('POST','https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110', params)
I am new to web scraping and would like to learn how to do it properly and politely. My problem is similar to [this][1].
'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'
I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/
My code with the solution from the link above which does not work for me:
import requests
from bs4 import BeautifulSoup
login_page = <login url>
link = <required url>
payload = {
“username” = <some username>,
“password” = <some password>
}
p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")
RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)
Output of p.status_code : 200
UPDATE:
s = requests.session()
doesn't solve my problem. I had tried this before I started looking into cookies.
Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
I'm try to write a web crawler that download a CSV file by a dynamic url.
The url is like http://aaa/bbb.mcv/Download?path=xxxx.csv
I put this url to my chrome browser but I just start to download immediately and the page won't change.
I can't even find any request in develop screen.
I've tried to ways to get the file
put the url in selenium
driver.get(url)
try to get file by requests lib
requests.get(url)
Both didn't work...
Any advice?
Output of two ways:
I try to get the screen shot and it seems doesn't change the page. (just like in chrome)
I try to print out the data I get and it seems like as html file.
Then open it in the browser it is a login page.
import requests
url = '...'
save_location = '...'
session = requests.session()
response = session.get(url)
with open(save_location, 'wb') as t:
for chunk in response.iter_content(1024):
t.write(chunk)
Thanks for everyone's help!
I finally find the problem is that...
I login the website by selenium and I use requests to download the file.
Selenium doesn't have any authentication information!
So my solution is get the cookies by selenium first.
Then send it into the requests!
Here is my Code
cookies = driver.get_cookies() #selenium web driver
s = requests.Session()
for cookie in cookies:
s.cookies.set(cookie['name'], cookie['value'])
response = s.get(url)
I'm trying to find a way to get the response of a post method executed through headless browser.
session = requests.Session()
session.get(<url here)
print session.cookies
r = session.post(url).content
print r
The problem is that the response r is full of javascript and I can't use Selenium to execute it because it doesn't support the POST method (as far as I know).
Any ideas?
You can try using selenium-requests:
Extends Selenium WebDriver classes to include the request function
from the Requests library, while doing all the needed cookie and
request headers handling.
Example:
from seleniumrequests import Firefox
webdriver = Firefox()
response = webdriver.request('POST', 'url here', data={"param1": "value1"})
print(response)
I want to login with cookiejar and and launch not the login page but a page that can only be seen after authenticated. I know mechanize does that but besides not working for me now, I rather do this without it. Now I have,
import urllib, urllib2, cookielib, webbrowser
from cookielib import CookieJar
username = 'my_username'
password = 'my_password'
url = 'my_login_page'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'my_username' : username, 'my_password' : password})
opener.open(url, login_data)
page_to_launch = 'my_authenticated_url'
webbrowser.open(page_to_launch, new=1, autoraise=1)
I am either able to login and dump the authenticated page to stdout, or launch the login page without recognizing the cookie, but I am not able to launch the page I want to after logging in. Help appreciated.
You could use the selenium module to do this. It starts a browser (chrome, Firefox, IE, etc) with an extension loaded into it that allows you to control the browser.
Here's how you load cookies into it:
from selenium import webdriver
driver = webdriver.Firefox() # open the browser
# Go to the correct domain
driver.get("http://www.example.com")
# Now set the cookie. Here's one for the entire domain
# the cookie name here is 'key' and it's value is 'value'
driver.add_cookie({'name':'key', 'value':'value', 'path':'/'})
# additional keys that can be passed in are:
# 'domain' -> String,
# 'secure' -> Boolean,
# 'expiry' -> Milliseconds since the Epoch it should expire.
# finally we visit the hidden page
driver.get('http://www.example.com/secret_page.html')
Your cookies aren't making it to the browser.
webbrowser has no facilities for accepting the cookies stored in your CookieJar instance. It's simply a generic interface for launching a browser with a URL. You will either have to implement a CookieJar that can store cookies in your browser (which is almost certainly no small task) or use an alternative library that solves this problem for you.