I am using Python Selenium Chrome WebDriver
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
and
self.driver = webdriver.Chrome(chrome_options=options, desired_capabilities=capabilities)
print self.driver.get('https://192.168.178.20:1337/login?email=me#domain.com&password=mypassword')
print self.driver.get('https://192.168.178.20:1337/this/that?name=john')
Before I didn't need to authenticate and my GET went through, but now I do using PUT request with email and password params. I have tested the PUT in POSTMAN and it worked fine.
Once authenticated, I want to browse to another URL using GET, but I am getting a 500 most likely because it didn't retain that I have authenticated.
How can I check that my login worked? How do I retrieve the response?
Do I need to retrieve & save some kind of token or cookie for the 2nd request go through?
Console log
headless_chrome > Auth
None
headless_chrome > GET
None
headless_chrome > CONSOLE
headless_chrome console > {u'source': u'network', u'message': u'https://192.168.178.20:1337/this/that?name=john 0:0 Failed to load resource: the server responded with a status of 500 (Internal Server Error)', u'timestamp': 1479212713208, u'level': u'SEVERE'}
headless_chrome > title:
{'_file_detector': <selenium.webdriver.remote.file_detector.LocalFileDetector object at 0x7f8507a25f50>,
'_is_remote': False,
'_mobile': <selenium.webdriver.remote.mobile.Mobile object at 0x7f8507a25d90>,
'_switch_to': <selenium.webdriver.remote.switch_to.SwitchTo instance at 0x7f8507a336c8>,
'capabilities': {u'acceptSslCerts': True,
u'applicationCacheEnabled': False,
u'browserConnectionEnabled': False,
u'browserName': u'chrome',
u'chrome': {u'chromedriverVersion': u'2.21.371461 (633e689b520b25f3e264a2ede6b74ccc23cb636a)',
u'userDataDir': u'/tmp/.com.google.Chrome.ybR9Fm'},
u'cssSelectorsEnabled': True,
u'databaseEnabled': False,
u'handlesAlerts': True,
u'hasTouchScreen': False,
u'javascriptEnabled': True,
u'locationContextEnabled': True,
u'mobileEmulationEnabled': False,
u'nativeEvents': True,
u'platform': u'Linux',
u'rotatable': False,
u'takesHeapSnapshot': True,
u'takesScreenshot': True,
u'version': u'50.0.2661.102',
u'webStorageEnabled': True},
'command_executor': <selenium.webdriver.chrome.remote_connection.ChromeRemoteConnection object at 0x7f8507a25cd0>,
'error_handler': <selenium.webdriver.remote.errorhandler.ErrorHandler object at 0x7f8507a25d50>,
'service': <selenium.webdriver.chrome.service.Service object at 0x7f85083bc3d0>,
'session_id': u'72a5ce48d950be26b3f33de4adb34428',
'w3c': False}
headless_chrome > except else
clean up Selenium browser
Selenium webdriver has no .put method. You should use .get for both authentication and then navigating to another url. Ideally this should work.
If that works manually then it should work through webdriver also.
Related
I'm calling get_cookies() on my selenium web driver. Of course we know this fetches the cookies for the current domain. However, many popular sites set cookies on both example.com and www.example.com.
Technically, it's not really a "separate domain" or even sub domain. I think nearly every website on the internet has the same site at the www sub domain as it does the root.
So is it still impossible to save cookies for the two domains, since one is a sub domain? I know the answer is complicated if you want to save cookies for all domains, but I figured this is kind of different since they really are the same domain.
Replicate it with this code:
from selenium import webdriver
import requests
driver = webdriver.Firefox()
driver.get("https://www.instagram.com/")
print(driver.get_cookies())
output:
[{'name': 'ig_did', 'value': 'F5FDFBB0-7D13-4E4E-A100-C627BD1998B7', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': True, 'expiry': 1671083433}, {'name': 'mid', 'value': 'X9hOqQAEAAFWnsZg8-PeYdGqVcTU', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1671083433}, {'name': 'ig_nrcb', 'value': '1', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639547433}, {'name': 'csrftoken', 'value': 'Yy8Bew6500BinlUcAK232m7xPnhOuN4Q', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639461034}]
Then load the page in a fresh browser instance and check yourself. You'll see www is there.
The main domain looks fine though:
My idea is to use requests library and get all cookies via REST query?
import requests
# Making a get request
response = requests.get('https://www.instagram.com/')
# printing request cookies
print(response.cookies)
Domain
To host your application on the internet need a domain name. Domain names act as a placeholder for the complex string of numbers known as an IP address. As an example,
https://www.instagram.com/
With the latest firefox v84.0 accessing the Instagram application the following cookies are observed within the https://www.instagram.com domain:
Subdomain
A subdomain is an add-on to your primary domain name. For example, when using the sites e.g. Craigslist, you are always using a subdomain like reno.craigslist.org, or sfbay.craigslist.org. You will be automatically be forwarded to the subdomain that corresponds to your physical location. Essentially, a subdomain is a separate part of your website that operates under the same primary domain name.
Reusing cookies
If you have stored the cookie from domain example.com, these stored cookies can't be pushed through the webdriver session to any other different domanin e.g. example.edu. The stored cookies can be used only within example.com. Further, to automatically login an user in future, you need to store the cookies only once, and that's when the user have logged in. Before adding back the cookies you need to browse to the same domain from where the cookies were collected.
Demonstration
As an example, you can store the cookies once the user have logged in within an application as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
driver.find_element_by_name("username").send_keys("abc123")
driver.find_element_by_name("password").send_keys("123xyz")
driver.find_element_by_name("submit").click()
# storing the cookies
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
driver.quit()
Later at any point of time if you want the user automatically logged-in, you need to browse to the specific domain /url first and then you have to add the cookies as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
# loading the stored cookies
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
# adding the cookies to the session through webdriver instance
driver.add_cookie(cookie)
driver.get('http://demo.guru99.com/test/cookie/selenium_cookie.php')
Reference
You can find a detailed discussion in:
org.openqa.selenium.InvalidCookieDomainException: Document is cookie-averse using Selenium and WebDriver
I'm trying to write in some defensive code to prevent someone from executing a script should they have an older version of geckodriver installed. I cannot for the life of me seem to get the geckodriver version from the webdriver object.
The closest I found is driver.capabilities which contains the firefox browser version, but not the geckodriver version.
from selenium import webdriver
driver = webdriver.Firefox()
pprint(driver.capabilities)
output:
{'acceptInsecureCerts': True,
'browserName': 'firefox',
'browserVersion': '60.0',
'moz:accessibilityChecks': False,
'moz:headless': False,
'moz:processID': 18584,
'moz:profile': '/var/folders/qz/0dsxssjd1133p_y44qbdszn00000gp/T/rust_mozprofile.GsKFWZ9kFgMT',
'moz:useNonSpecCompliantPointerOrigin': False,
'moz:webdriverClick': True,
'pageLoadStrategy': 'normal',
'platformName': 'darwin',
'platformVersion': '17.5.0',
'rotatable': False,
'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}}
Is it possible the browser version and geckodriver versions are linked directly? if not, how can I check the geckodriver version from within python?
There is no method in the python bindings to get the geckodriver version, you will have to implement it yourself, my first option would be subprocess
# Mind the encoding, it must match your system's
output = subprocess.run(['geckodriver', '-V'], stdout=subprocess.PIPE, encoding='utf-8')
version = output.stdout.splitlines()[0].split()[-1]
It appears that moz:geckodriverVersion has been added to the capabilities sometime late 2018.
driverversion = driver.capabilities['moz:geckodriverVersion']
browserversion = driver.capabilities['browserVersion']
I am trying to add python requests session cookies to my selenium webdriver.
I have tried this so far
for c in self.s.cookies :
driver.add_cookie({'name': c.name, 'value': c.value, 'path': c.path, 'expiry': c.expires})
This code is working fine for PhantomJS whereas it's not for Firefox and Chrome.
My Questions:
Is there any special iterating of cookiejar for Firefox and Chrome?
Why it is working for PhantomJS?
for cookie in s.cookies: # session cookies
# Setting domain to None automatically instructs most webdrivers to use the domain of the current window
# handle
cookie_dict = {'domain': None, 'name': cookie.name, 'value': cookie.value, 'secure': cookie.secure}
if cookie.expires:
cookie_dict['expiry'] = cookie.expires
if cookie.path_specified:
cookie_dict['path'] = cookie.path
driver.add_cookie(cookie_dict)
Check this for a complete solution https://github.com/cryzed/Selenium-Requests/blob/master/seleniumrequests/request.py
Im looking to use requests.session and beautifulsoup. If a specific status of 503 is identified I want to then open that session in a web browser. The problem is I have no idea how to move a python requests session into a browser using selenium. Any guidance would be appreciated.
Requests sessions have CookieJar objects that you can use to import into Selenium.
For example:
driver = webdriver.Firefox()
s = requests.Session()
s.get('http://example.com')
for cookie in s.cookies:
driver.add_cookie({
'name': cookie.name,
'value': cookie.value,
'path': '/',
'domain': cookie.domain,
})
driver should now have all of the cookies (and therefore sessions) that Requests has.
I'm trying to scrape from sites after authentication. I was able to take the JSESSIONID cookie from an authenticated browser session and download the correct page using urlopener like below.
import cookielib, urllib2
cj = cookielib.CookieJar()
c1 = cookielib.Cookie(None, "JSESSIONID", SESSIONID, None, None, DOMAIN,
True, False, "/store",True, False, None, False, None, None, None)
cj.set_cookie(c1)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
fh = opener.open(url)
But when I use this code for creating scrapy requests (tried both dict cookies and cookiejar), the downloaded page is the non-authenticated version. Anyone know what the problem is?
cookies = [{
'name': 'JSESSIONID',
'value': SESSIONID,
'path': '/store',
'domain': DOMAIN,
'secure': False,
}]
request1 = Request(url, cookies=self.cookies, meta={'dont_merge_cookies': False})
request2 = Request(url, meta={'dont_merge_cookies': True, 'cookiejar': cj})
You were able to get the JSESSIONID from your browser.
Why not let Scrapy simulate a user login for you?
Then, I think your JSESSIONID cookie will stick to subsequent requests given that :
Scrapy uses a single cookie jar (as opposed to Multiple cookie sessions per spider) for the entire spider
lifetime containing all your scraping steps,
the COOKIES_ENABLED setting for the cookie middleware defaults to
true,
dont_merge_cookies defaults to false :
When some site returns cookies (in a response) those are stored in the
cookies for that domain and will be sent again in future requests.
That’s the typical behaviour of any regular web browser. However, if,
for some reason, you want to avoid merging with existing cookies you
can instruct Scrapy to do so by setting the dont_merge_cookies key to
True in the Request.meta.
Example of request without merging cookies:
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'},
meta={'dont_merge_cookies': True})