get cookies for www subdomain, or a particular domain? - python

I'm calling get_cookies() on my selenium web driver. Of course we know this fetches the cookies for the current domain. However, many popular sites set cookies on both example.com and www.example.com.
Technically, it's not really a "separate domain" or even sub domain. I think nearly every website on the internet has the same site at the www sub domain as it does the root.
So is it still impossible to save cookies for the two domains, since one is a sub domain? I know the answer is complicated if you want to save cookies for all domains, but I figured this is kind of different since they really are the same domain.
Replicate it with this code:
from selenium import webdriver
import requests
driver = webdriver.Firefox()
driver.get("https://www.instagram.com/")
print(driver.get_cookies())
output:
[{'name': 'ig_did', 'value': 'F5FDFBB0-7D13-4E4E-A100-C627BD1998B7', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': True, 'expiry': 1671083433}, {'name': 'mid', 'value': 'X9hOqQAEAAFWnsZg8-PeYdGqVcTU', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1671083433}, {'name': 'ig_nrcb', 'value': '1', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639547433}, {'name': 'csrftoken', 'value': 'Yy8Bew6500BinlUcAK232m7xPnhOuN4Q', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639461034}]
Then load the page in a fresh browser instance and check yourself. You'll see www is there.
The main domain looks fine though:

My idea is to use requests library and get all cookies via REST query?
import requests
# Making a get request
response = requests.get('https://www.instagram.com/')
# printing request cookies
print(response.cookies)

Domain
To host your application on the internet need a domain name. Domain names act as a placeholder for the complex string of numbers known as an IP address. As an example,
https://www.instagram.com/
With the latest firefox v84.0 accessing the Instagram application the following cookies are observed within the https://www.instagram.com domain:
Subdomain
A subdomain is an add-on to your primary domain name. For example, when using the sites e.g. Craigslist, you are always using a subdomain like reno.craigslist.org, or sfbay.craigslist.org. You will be automatically be forwarded to the subdomain that corresponds to your physical location. Essentially, a subdomain is a separate part of your website that operates under the same primary domain name.
Reusing cookies
If you have stored the cookie from domain example.com, these stored cookies can't be pushed through the webdriver session to any other different domanin e.g. example.edu. The stored cookies can be used only within example.com. Further, to automatically login an user in future, you need to store the cookies only once, and that's when the user have logged in. Before adding back the cookies you need to browse to the same domain from where the cookies were collected.
Demonstration
As an example, you can store the cookies once the user have logged in within an application as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
driver.find_element_by_name("username").send_keys("abc123")
driver.find_element_by_name("password").send_keys("123xyz")
driver.find_element_by_name("submit").click()
# storing the cookies
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
driver.quit()
Later at any point of time if you want the user automatically logged-in, you need to browse to the specific domain /url first and then you have to add the cookies as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
# loading the stored cookies
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
# adding the cookies to the session through webdriver instance
driver.add_cookie(cookie)
driver.get('http://demo.guru99.com/test/cookie/selenium_cookie.php')
Reference
You can find a detailed discussion in:
org.openqa.selenium.InvalidCookieDomainException: Document is cookie-averse using Selenium and WebDriver

Related

Move cookies from requests to selenium

I would like to move the cookies in a python requests session to my selenium browser. At the moment, I am doing this:
cookies = session.cookie_jar
for cookie in cookies: # Add success cookies
driver.add_cookie({'name': cookie.name, 'value': cookie.value, 'path': cookie.path, 'expiry': cookie.expires})
However, I get some errors like
AttributeError: 'Morsel' object has no attribute 'path'
How can I fix that?
Thanks.

Selenium Add Cookies From CookieJar

I am trying to add python requests session cookies to my selenium webdriver.
I have tried this so far
for c in self.s.cookies :
driver.add_cookie({'name': c.name, 'value': c.value, 'path': c.path, 'expiry': c.expires})
This code is working fine for PhantomJS whereas it's not for Firefox and Chrome.
My Questions:
Is there any special iterating of cookiejar for Firefox and Chrome?
Why it is working for PhantomJS?
for cookie in s.cookies: # session cookies
# Setting domain to None automatically instructs most webdrivers to use the domain of the current window
# handle
cookie_dict = {'domain': None, 'name': cookie.name, 'value': cookie.value, 'secure': cookie.secure}
if cookie.expires:
cookie_dict['expiry'] = cookie.expires
if cookie.path_specified:
cookie_dict['path'] = cookie.path
driver.add_cookie(cookie_dict)
Check this for a complete solution https://github.com/cryzed/Selenium-Requests/blob/master/seleniumrequests/request.py

Pythons requests session to open browser using selenium

Im looking to use requests.session and beautifulsoup. If a specific status of 503 is identified I want to then open that session in a web browser. The problem is I have no idea how to move a python requests session into a browser using selenium. Any guidance would be appreciated.
Requests sessions have CookieJar objects that you can use to import into Selenium.
For example:
driver = webdriver.Firefox()
s = requests.Session()
s.get('http://example.com')
for cookie in s.cookies:
driver.add_cookie({
'name': cookie.name,
'value': cookie.value,
'path': '/',
'domain': cookie.domain,
})
driver should now have all of the cookies (and therefore sessions) that Requests has.

Selenium: Trying to log in with cookies - "Can only set cookies for current domain"

What I am trying to achieve
I am trying to log in to a website where cookies must be enabled using Selenium headless, I am using PhantomJS for driver.
Problem
I first recorded the procedure using Selenium IDE where it works fine using Firefox (not headless). Then I exported the code to Python and now I can't log in because it's throwing an error saying "Can only set Cookies for the current domain". I don't know why I am getting this problem, am I not on the correct domain?
Code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import unittest, time, re
self.driver = webdriver.PhantomJS()
self.driver.implicitly_wait(30)
self.base_url = "https://login.example.com"
driver = self.driver
driver.get(self.base_url)
all_cookies = self.driver.get_cookies()
# It prints out all cookies and values just fine
for cookie in all_cookies
print cookie['name'] + " --> " + cookies['value']
# Set cookies to driver
for s_cookie in all_cookies:
c = { s_cookie['name'] : s_cookie['value']}
# This is where it's throwing an error saying "Can only set Cookies for current domain
driver.add_cookie(c)
...
What I've tried
I've tried saving the cookies in a dict, going to another domain, going back to original domain and added the cookies and then trying to log in but it still doesn't work (as suggested in this thread)
Any help is appreciated.
Investigate the each cookies pairs. I ran into the similar issues and some of the cookies belonged to Google. You need to make sure cookies are being added only to the current Domain and also belong to the same Domain. In that case your exception is expected. On a side note, if I recall it correctly you cannot use localhost to add the cookies if you are doing so. Change to IP address. Also, investigate the cookies you are getting specially domain and expiry information. See, if they are returning null
Edit
I did this simple test on Gmail to show what you have done wrong. At first look I did not notice that you are trying to grab partial cookie, a pair, and add that to the domain. Since, the cookie does not have any Domain, path, expiry etc. information it was trying to add the cookie to current domain(127.0.0.1) and throwing some misleading info that did not quite make sense. Notice: in order to be a valid cookie it must have to have the correct Domain and expiry information which you have been missing.
import unittest
from selenium.webdriver.common.by import By
from selenium import webdriver
__author__ = 'Saifur'
class CookieManagerTest(unittest.TestCase):
def setUp(self):
self.driver = webdriver.PhantomJS("E:\\working\\selenium.python\\selenium\\resources\\phantomjs.exe")
self.driver.get("https://accounts.google.com/ServiceLogin?service=mail&continue=https://mail.google.com/mail/")
self.driver.find_element(By.ID, "Email").send_keys("userid")
self.driver.find_element(By.ID, "next").click()
self.driver.find_element(By.ID, "Passwd").send_keys("supersimplepassword")
self.driver.find_element(By.CSS_SELECTOR, "[type='submit'][value='Sign in']").click()
self.driver.maximize_window()
def test(self):
driver = self.driver
listcookies = driver.get_cookies()
for s_cookie in listcookies:
# this is what you are doing
c = {s_cookie['name']: s_cookie['value']}
print("*****The partial cookie info you are doing*****\n")
print(c)
# Should be done
print("The Full Cookie including domain and expiry info\n")
print(s_cookie)
# driver.add_cookie(s_cookie)
def tearDown(self):
self.driver.quit()
Console output:
D:\Python34\python.exe "D:\Program Files (x86)\JetBrains\PyCharm Educational Edition 1.0.1\helpers\pycharm\utrunner.py" E:\working\selenium.python\selenium\python\FirstTest.py::CookieManagerTest true
Testing started at 9:59 AM ...
*******The partial cookie info you are doing*******
{'PREF': 'ID=*******:FF=0:LD=en:TM=*******:LM=*******:GM=1:S=*******'}
The Full Cookie including domain and expiry info
{'httponly': False, 'name': '*******', 'value': 'ID=*******:FF=0:LD=en:TM=*******:LM=1432393656:GM=1:S=iNakWMI5h_2cqIYi', 'path': '/', 'expires': 'Mon, 22 May 2017 15:07:36 GMT', 'secure': False, 'expiry': *******, 'domain': '.google.com'}
Notice: I just replaced some info with ******* on purpose
I was going to just add a comment onto the bottom of what #Saifur said above, but I figured I had enough new content to warrant a full comment.
The revelation for me, having this exact same error, was that using Selenium works exactly the same as if you're actually opening up your browser and physically clicking and typing things. With this in mind, if you enter the user/pass into Selenium and press click(), your Selenium driver will, upon successful authentican, automatically have the cookie in it. Thus negating any need to smash in my saved (probably going to expire soon) cookie. I felt a little silly realizing this. Made everything so much simpler.
Using #Saifur's code above as an template, I've made some adjustments and removed what I feel is a bit excessive of an extra whole class for the execution in this example.
url = 'http://domainname.com'
url2 = 'http://domainname.com/page'
USER = 'superAwesomeRobot'
PASS = 'superSecretRobot'
# initiates your browser
driver = webdriver.PhantomJS()
# browses to your desired URL
driver.get(url)
# searches for the user or email field on the page, and inputs USER
driver.find_element_by_id("email").send_keys(USER)
# searches for the password field on the page, and inputs PASS
driver.find_element_by_id("pass").send_keys(PASS)
# finds the login button and click you're in
driver.find_element_by_id("loginbutton").click()
from here you can browse to the page you want to address
driver.get(url2)
note: if you have a modern site that auto loads when you scroll down, it might be handy to use this:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
I would also like to note, #simeg, that Selenium automatically is supposed to wait until the page has returned that it's loaded (and yes, I've had the AJAX issue that is being referred to, so sometimes it is necessary to wait a few seconds - what page takes more then 30 seconds to load?!). The way that you're running your wait command is just waiting for PhantomJS to load, not the actual page itself so it seems of no use to me considering the built in function:
The driver.get method will navigate to a page given by the URL. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script. It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded.
source: http://selenium-python.readthedocs.io/getting-started.html#example-explained
Hope this helps somebody!
Some webpages use too many keys in the cookies not supported by webdriver, then you get an "errorMessage":"Can only set Cookies for the current domain", even though you are 100% sure that you are setting cookies for the current domain. An example of such webpage is "https://stackoverflow.com/". In this case, you need to make sure that only the required keys are added to the cookies, as mentioned in some previous posts.
driver.add_cookie({k: cookie[k] for k in ('name', 'value', 'domain', 'path', 'expiry') if k in cookie})
In constrast, some webpages use too few keys in the cookies, that are required by webdriver, then you get an "errorMessage":"Can only set Cookies for the current domain", after you fix the first problem. An example of such webpage is "https://github.com/". You need to add key 'expiry' to the cookies for this webpage.
for k in ('name', 'value', 'domain', 'path', 'expiry'):
if k not in list(cookie.keys()):
if k == 'expiry':
cookie[k] = 1475825481
Putting them all together, the complete code is as below:
# uncommented one of the following three URLs to test
#targetURL = "http://pythonscraping.com"
targetURL = "https://stackoverflow.com/"
#targetURL = "https://github.com/"
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(targetURL)
driver.implicitly_wait(1)
#print(driver.get_cookies())
savedCookies = driver.get_cookies()
driver2 = webdriver.PhantomJS()
driver2.get(targetURL)
driver2.implicitly_wait(1)
driver2.delete_all_cookies()
for cookie in savedCookies:
# fix the 2nd problem
for k in ('name', 'value', 'domain', 'path', 'expiry'):
if k not in list(cookie.keys()):
if k == 'expiry':
cookie[k] = 1475825481
# fix the 1st problem
driver2.add_cookie({k: cookie[k] for k in ('name', 'value', 'domain', 'path', 'expiry') if k in cookie})
print(cookie)
driver2.get(targetURL)
driver2.implicitly_wait(1)

Python scrapy login using session cookie

I'm trying to scrape from sites after authentication. I was able to take the JSESSIONID cookie from an authenticated browser session and download the correct page using urlopener like below.
import cookielib, urllib2
cj = cookielib.CookieJar()
c1 = cookielib.Cookie(None, "JSESSIONID", SESSIONID, None, None, DOMAIN,
True, False, "/store",True, False, None, False, None, None, None)
cj.set_cookie(c1)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
fh = opener.open(url)
But when I use this code for creating scrapy requests (tried both dict cookies and cookiejar), the downloaded page is the non-authenticated version. Anyone know what the problem is?
cookies = [{
'name': 'JSESSIONID',
'value': SESSIONID,
'path': '/store',
'domain': DOMAIN,
'secure': False,
}]
request1 = Request(url, cookies=self.cookies, meta={'dont_merge_cookies': False})
request2 = Request(url, meta={'dont_merge_cookies': True, 'cookiejar': cj})
You were able to get the JSESSIONID from your browser.
Why not let Scrapy simulate a user login for you?
Then, I think your JSESSIONID cookie will stick to subsequent requests given that :
Scrapy uses a single cookie jar (as opposed to Multiple cookie sessions per spider) for the entire spider
lifetime containing all your scraping steps,
the COOKIES_ENABLED setting for the cookie middleware defaults to
true,
dont_merge_cookies defaults to false :
When some site returns cookies (in a response) those are stored in the
cookies for that domain and will be sent again in future requests.
That’s the typical behaviour of any regular web browser. However, if,
for some reason, you want to avoid merging with existing cookies you
can instruct Scrapy to do so by setting the dont_merge_cookies key to
True in the Request.meta.
Example of request without merging cookies:
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'},
meta={'dont_merge_cookies': True})

Categories