I would like to move the cookies in a python requests session to my selenium browser. At the moment, I am doing this:
cookies = session.cookie_jar
for cookie in cookies: # Add success cookies
driver.add_cookie({'name': cookie.name, 'value': cookie.value, 'path': cookie.path, 'expiry': cookie.expires})
However, I get some errors like
AttributeError: 'Morsel' object has no attribute 'path'
How can I fix that?
Thanks.
I am trying to add python requests session cookies to my selenium webdriver.
I have tried this so far
for c in self.s.cookies :
driver.add_cookie({'name': c.name, 'value': c.value, 'path': c.path, 'expiry': c.expires})
This code is working fine for PhantomJS whereas it's not for Firefox and Chrome.
My Questions:
Is there any special iterating of cookiejar for Firefox and Chrome?
Why it is working for PhantomJS?
for cookie in s.cookies: # session cookies
# Setting domain to None automatically instructs most webdrivers to use the domain of the current window
# handle
cookie_dict = {'domain': None, 'name': cookie.name, 'value': cookie.value, 'secure': cookie.secure}
if cookie.expires:
cookie_dict['expiry'] = cookie.expires
if cookie.path_specified:
cookie_dict['path'] = cookie.path
driver.add_cookie(cookie_dict)
Check this for a complete solution https://github.com/cryzed/Selenium-Requests/blob/master/seleniumrequests/request.py
Im looking to use requests.session and beautifulsoup. If a specific status of 503 is identified I want to then open that session in a web browser. The problem is I have no idea how to move a python requests session into a browser using selenium. Any guidance would be appreciated.
Requests sessions have CookieJar objects that you can use to import into Selenium.
For example:
driver = webdriver.Firefox()
s = requests.Session()
s.get('http://example.com')
for cookie in s.cookies:
driver.add_cookie({
'name': cookie.name,
'value': cookie.value,
'path': '/',
'domain': cookie.domain,
})
driver should now have all of the cookies (and therefore sessions) that Requests has.
What I am trying to achieve
I am trying to log in to a website where cookies must be enabled using Selenium headless, I am using PhantomJS for driver.
Problem
I first recorded the procedure using Selenium IDE where it works fine using Firefox (not headless). Then I exported the code to Python and now I can't log in because it's throwing an error saying "Can only set Cookies for the current domain". I don't know why I am getting this problem, am I not on the correct domain?
Code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import unittest, time, re
self.driver = webdriver.PhantomJS()
self.driver.implicitly_wait(30)
self.base_url = "https://login.example.com"
driver = self.driver
driver.get(self.base_url)
all_cookies = self.driver.get_cookies()
# It prints out all cookies and values just fine
for cookie in all_cookies
print cookie['name'] + " --> " + cookies['value']
# Set cookies to driver
for s_cookie in all_cookies:
c = { s_cookie['name'] : s_cookie['value']}
# This is where it's throwing an error saying "Can only set Cookies for current domain
driver.add_cookie(c)
...
What I've tried
I've tried saving the cookies in a dict, going to another domain, going back to original domain and added the cookies and then trying to log in but it still doesn't work (as suggested in this thread)
Any help is appreciated.
Investigate the each cookies pairs. I ran into the similar issues and some of the cookies belonged to Google. You need to make sure cookies are being added only to the current Domain and also belong to the same Domain. In that case your exception is expected. On a side note, if I recall it correctly you cannot use localhost to add the cookies if you are doing so. Change to IP address. Also, investigate the cookies you are getting specially domain and expiry information. See, if they are returning null
Edit
I did this simple test on Gmail to show what you have done wrong. At first look I did not notice that you are trying to grab partial cookie, a pair, and add that to the domain. Since, the cookie does not have any Domain, path, expiry etc. information it was trying to add the cookie to current domain(127.0.0.1) and throwing some misleading info that did not quite make sense. Notice: in order to be a valid cookie it must have to have the correct Domain and expiry information which you have been missing.
import unittest
from selenium.webdriver.common.by import By
from selenium import webdriver
__author__ = 'Saifur'
class CookieManagerTest(unittest.TestCase):
def setUp(self):
self.driver = webdriver.PhantomJS("E:\\working\\selenium.python\\selenium\\resources\\phantomjs.exe")
self.driver.get("https://accounts.google.com/ServiceLogin?service=mail&continue=https://mail.google.com/mail/")
self.driver.find_element(By.ID, "Email").send_keys("userid")
self.driver.find_element(By.ID, "next").click()
self.driver.find_element(By.ID, "Passwd").send_keys("supersimplepassword")
self.driver.find_element(By.CSS_SELECTOR, "[type='submit'][value='Sign in']").click()
self.driver.maximize_window()
def test(self):
driver = self.driver
listcookies = driver.get_cookies()
for s_cookie in listcookies:
# this is what you are doing
c = {s_cookie['name']: s_cookie['value']}
print("*****The partial cookie info you are doing*****\n")
print(c)
# Should be done
print("The Full Cookie including domain and expiry info\n")
print(s_cookie)
# driver.add_cookie(s_cookie)
def tearDown(self):
self.driver.quit()
Console output:
D:\Python34\python.exe "D:\Program Files (x86)\JetBrains\PyCharm Educational Edition 1.0.1\helpers\pycharm\utrunner.py" E:\working\selenium.python\selenium\python\FirstTest.py::CookieManagerTest true
Testing started at 9:59 AM ...
*******The partial cookie info you are doing*******
{'PREF': 'ID=*******:FF=0:LD=en:TM=*******:LM=*******:GM=1:S=*******'}
The Full Cookie including domain and expiry info
{'httponly': False, 'name': '*******', 'value': 'ID=*******:FF=0:LD=en:TM=*******:LM=1432393656:GM=1:S=iNakWMI5h_2cqIYi', 'path': '/', 'expires': 'Mon, 22 May 2017 15:07:36 GMT', 'secure': False, 'expiry': *******, 'domain': '.google.com'}
Notice: I just replaced some info with ******* on purpose
I was going to just add a comment onto the bottom of what #Saifur said above, but I figured I had enough new content to warrant a full comment.
The revelation for me, having this exact same error, was that using Selenium works exactly the same as if you're actually opening up your browser and physically clicking and typing things. With this in mind, if you enter the user/pass into Selenium and press click(), your Selenium driver will, upon successful authentican, automatically have the cookie in it. Thus negating any need to smash in my saved (probably going to expire soon) cookie. I felt a little silly realizing this. Made everything so much simpler.
Using #Saifur's code above as an template, I've made some adjustments and removed what I feel is a bit excessive of an extra whole class for the execution in this example.
url = 'http://domainname.com'
url2 = 'http://domainname.com/page'
USER = 'superAwesomeRobot'
PASS = 'superSecretRobot'
# initiates your browser
driver = webdriver.PhantomJS()
# browses to your desired URL
driver.get(url)
# searches for the user or email field on the page, and inputs USER
driver.find_element_by_id("email").send_keys(USER)
# searches for the password field on the page, and inputs PASS
driver.find_element_by_id("pass").send_keys(PASS)
# finds the login button and click you're in
driver.find_element_by_id("loginbutton").click()
from here you can browse to the page you want to address
driver.get(url2)
note: if you have a modern site that auto loads when you scroll down, it might be handy to use this:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
I would also like to note, #simeg, that Selenium automatically is supposed to wait until the page has returned that it's loaded (and yes, I've had the AJAX issue that is being referred to, so sometimes it is necessary to wait a few seconds - what page takes more then 30 seconds to load?!). The way that you're running your wait command is just waiting for PhantomJS to load, not the actual page itself so it seems of no use to me considering the built in function:
The driver.get method will navigate to a page given by the URL. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script. It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded.
source: http://selenium-python.readthedocs.io/getting-started.html#example-explained
Hope this helps somebody!
Some webpages use too many keys in the cookies not supported by webdriver, then you get an "errorMessage":"Can only set Cookies for the current domain", even though you are 100% sure that you are setting cookies for the current domain. An example of such webpage is "https://stackoverflow.com/". In this case, you need to make sure that only the required keys are added to the cookies, as mentioned in some previous posts.
driver.add_cookie({k: cookie[k] for k in ('name', 'value', 'domain', 'path', 'expiry') if k in cookie})
In constrast, some webpages use too few keys in the cookies, that are required by webdriver, then you get an "errorMessage":"Can only set Cookies for the current domain", after you fix the first problem. An example of such webpage is "https://github.com/". You need to add key 'expiry' to the cookies for this webpage.
for k in ('name', 'value', 'domain', 'path', 'expiry'):
if k not in list(cookie.keys()):
if k == 'expiry':
cookie[k] = 1475825481
Putting them all together, the complete code is as below:
# uncommented one of the following three URLs to test
#targetURL = "http://pythonscraping.com"
targetURL = "https://stackoverflow.com/"
#targetURL = "https://github.com/"
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(targetURL)
driver.implicitly_wait(1)
#print(driver.get_cookies())
savedCookies = driver.get_cookies()
driver2 = webdriver.PhantomJS()
driver2.get(targetURL)
driver2.implicitly_wait(1)
driver2.delete_all_cookies()
for cookie in savedCookies:
# fix the 2nd problem
for k in ('name', 'value', 'domain', 'path', 'expiry'):
if k not in list(cookie.keys()):
if k == 'expiry':
cookie[k] = 1475825481
# fix the 1st problem
driver2.add_cookie({k: cookie[k] for k in ('name', 'value', 'domain', 'path', 'expiry') if k in cookie})
print(cookie)
driver2.get(targetURL)
driver2.implicitly_wait(1)
I'm trying to scrape from sites after authentication. I was able to take the JSESSIONID cookie from an authenticated browser session and download the correct page using urlopener like below.
import cookielib, urllib2
cj = cookielib.CookieJar()
c1 = cookielib.Cookie(None, "JSESSIONID", SESSIONID, None, None, DOMAIN,
True, False, "/store",True, False, None, False, None, None, None)
cj.set_cookie(c1)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
fh = opener.open(url)
But when I use this code for creating scrapy requests (tried both dict cookies and cookiejar), the downloaded page is the non-authenticated version. Anyone know what the problem is?
cookies = [{
'name': 'JSESSIONID',
'value': SESSIONID,
'path': '/store',
'domain': DOMAIN,
'secure': False,
}]
request1 = Request(url, cookies=self.cookies, meta={'dont_merge_cookies': False})
request2 = Request(url, meta={'dont_merge_cookies': True, 'cookiejar': cj})
You were able to get the JSESSIONID from your browser.
Why not let Scrapy simulate a user login for you?
Then, I think your JSESSIONID cookie will stick to subsequent requests given that :
Scrapy uses a single cookie jar (as opposed to Multiple cookie sessions per spider) for the entire spider
lifetime containing all your scraping steps,
the COOKIES_ENABLED setting for the cookie middleware defaults to
true,
dont_merge_cookies defaults to false :
When some site returns cookies (in a response) those are stored in the
cookies for that domain and will be sent again in future requests.
That’s the typical behaviour of any regular web browser. However, if,
for some reason, you want to avoid merging with existing cookies you
can instruct Scrapy to do so by setting the dont_merge_cookies key to
True in the Request.meta.
Example of request without merging cookies:
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'},
meta={'dont_merge_cookies': True})