Pythons requests session to open browser using selenium - python

Im looking to use requests.session and beautifulsoup. If a specific status of 503 is identified I want to then open that session in a web browser. The problem is I have no idea how to move a python requests session into a browser using selenium. Any guidance would be appreciated.

Requests sessions have CookieJar objects that you can use to import into Selenium.
For example:
driver = webdriver.Firefox()
s = requests.Session()
s.get('http://example.com')
for cookie in s.cookies:
driver.add_cookie({
'name': cookie.name,
'value': cookie.value,
'path': '/',
'domain': cookie.domain,
})
driver should now have all of the cookies (and therefore sessions) that Requests has.

Related

get cookies for www subdomain, or a particular domain?

I'm calling get_cookies() on my selenium web driver. Of course we know this fetches the cookies for the current domain. However, many popular sites set cookies on both example.com and www.example.com.
Technically, it's not really a "separate domain" or even sub domain. I think nearly every website on the internet has the same site at the www sub domain as it does the root.
So is it still impossible to save cookies for the two domains, since one is a sub domain? I know the answer is complicated if you want to save cookies for all domains, but I figured this is kind of different since they really are the same domain.
Replicate it with this code:
from selenium import webdriver
import requests
driver = webdriver.Firefox()
driver.get("https://www.instagram.com/")
print(driver.get_cookies())
output:
[{'name': 'ig_did', 'value': 'F5FDFBB0-7D13-4E4E-A100-C627BD1998B7', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': True, 'expiry': 1671083433}, {'name': 'mid', 'value': 'X9hOqQAEAAFWnsZg8-PeYdGqVcTU', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1671083433}, {'name': 'ig_nrcb', 'value': '1', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639547433}, {'name': 'csrftoken', 'value': 'Yy8Bew6500BinlUcAK232m7xPnhOuN4Q', 'path': '/', 'domain': '.instagram.com', 'secure': True, 'httpOnly': False, 'expiry': 1639461034}]
Then load the page in a fresh browser instance and check yourself. You'll see www is there.
The main domain looks fine though:
My idea is to use requests library and get all cookies via REST query?
import requests
# Making a get request
response = requests.get('https://www.instagram.com/')
# printing request cookies
print(response.cookies)
Domain
To host your application on the internet need a domain name. Domain names act as a placeholder for the complex string of numbers known as an IP address. As an example,
https://www.instagram.com/
With the latest firefox v84.0 accessing the Instagram application the following cookies are observed within the https://www.instagram.com domain:
Subdomain
A subdomain is an add-on to your primary domain name. For example, when using the sites e.g. Craigslist, you are always using a subdomain like reno.craigslist.org, or sfbay.craigslist.org. You will be automatically be forwarded to the subdomain that corresponds to your physical location. Essentially, a subdomain is a separate part of your website that operates under the same primary domain name.
Reusing cookies
If you have stored the cookie from domain example.com, these stored cookies can't be pushed through the webdriver session to any other different domanin e.g. example.edu. The stored cookies can be used only within example.com. Further, to automatically login an user in future, you need to store the cookies only once, and that's when the user have logged in. Before adding back the cookies you need to browse to the same domain from where the cookies were collected.
Demonstration
As an example, you can store the cookies once the user have logged in within an application as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
driver.find_element_by_name("username").send_keys("abc123")
driver.find_element_by_name("password").send_keys("123xyz")
driver.find_element_by_name("submit").click()
# storing the cookies
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))
driver.quit()
Later at any point of time if you want the user automatically logged-in, you need to browse to the specific domain /url first and then you have to add the cookies as follows:
from selenium import webdriver
import pickle
driver = webdriver.Chrome()
driver.get('http://demo.guru99.com/test/cookie/selenium_aut.php')
# loading the stored cookies
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
# adding the cookies to the session through webdriver instance
driver.add_cookie(cookie)
driver.get('http://demo.guru99.com/test/cookie/selenium_cookie.php')
Reference
You can find a detailed discussion in:
org.openqa.selenium.InvalidCookieDomainException: Document is cookie-averse using Selenium and WebDriver

using python-requests to access a site with captcha

I've searched the web on how to access a website using request, essentially the site ask the user to complete a captcha form before they can access the site.
As of now I understand the process should be
visit the site using selenium
from selenium import webdriver
browser = webdriver.Chrome('chromedriver.exe')
browser.get('link-to-site')
complete the captcha form
save the cookies from that selenium session (since some how these cookies will contain data showing that you've completed captcha
input('cookies ready ?')
pickle.dump( browser.get_cookies() , open("cookies.pkl","wb"))
open a request session
get the site
import requests
session = requests.session()
r = session.get('link-to-site')
then load the cookies in
with open('cookies.pkl', 'r') as f:
cookies = requests.utils.cookiejar_from_dict(json.load(f))
session.cookies.update(cookies)
But I'm still unable to access the site, so I'm assuming the google captcha hasn't been solved when I'm using requests.
So there must be a correct way to go about this, I must be missing something?
You need to load the site after setting the cookies. Otherwise, the response is what the response would be without any cookies. Although having said that you will normally need to submit the form with selenium then list the cookies as a captcha doesn't normally set a cookie in itself.

Move cookies from requests to selenium

I would like to move the cookies in a python requests session to my selenium browser. At the moment, I am doing this:
cookies = session.cookie_jar
for cookie in cookies: # Add success cookies
driver.add_cookie({'name': cookie.name, 'value': cookie.value, 'path': cookie.path, 'expiry': cookie.expires})
However, I get some errors like
AttributeError: 'Morsel' object has no attribute 'path'
How can I fix that?
Thanks.

Selenium Post method

I'm trying to find a way to get the response of a post method executed through headless browser.
session = requests.Session()
session.get(<url here)
print session.cookies
r = session.post(url).content
print r
The problem is that the response r is full of javascript and I can't use Selenium to execute it because it doesn't support the POST method (as far as I know).
Any ideas?
You can try using selenium-requests:
Extends Selenium WebDriver classes to include the request function
from the Requests library, while doing all the needed cookie and
request headers handling.
Example:
from seleniumrequests import Firefox
webdriver = Firefox()
response = webdriver.request('POST', 'url here', data={"param1": "value1"})
print(response)

Python authenticate and launch private page using webbrowser, urllib and CookieJar

I want to login with cookiejar and and launch not the login page but a page that can only be seen after authenticated. I know mechanize does that but besides not working for me now, I rather do this without it. Now I have,
import urllib, urllib2, cookielib, webbrowser
from cookielib import CookieJar
username = 'my_username'
password = 'my_password'
url = 'my_login_page'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'my_username' : username, 'my_password' : password})
opener.open(url, login_data)
page_to_launch = 'my_authenticated_url'
webbrowser.open(page_to_launch, new=1, autoraise=1)
I am either able to login and dump the authenticated page to stdout, or launch the login page without recognizing the cookie, but I am not able to launch the page I want to after logging in. Help appreciated.
You could use the selenium module to do this. It starts a browser (chrome, Firefox, IE, etc) with an extension loaded into it that allows you to control the browser.
Here's how you load cookies into it:
from selenium import webdriver
driver = webdriver.Firefox() # open the browser
# Go to the correct domain
driver.get("http://www.example.com")
# Now set the cookie. Here's one for the entire domain
# the cookie name here is 'key' and it's value is 'value'
driver.add_cookie({'name':'key', 'value':'value', 'path':'/'})
# additional keys that can be passed in are:
# 'domain' -> String,
# 'secure' -> Boolean,
# 'expiry' -> Milliseconds since the Epoch it should expire.
# finally we visit the hidden page
driver.get('http://www.example.com/secret_page.html')
Your cookies aren't making it to the browser.
webbrowser has no facilities for accepting the cookies stored in your CookieJar instance. It's simply a generic interface for launching a browser with a URL. You will either have to implement a CookieJar that can store cookies in your browser (which is almost certainly no small task) or use an alternative library that solves this problem for you.

Categories