Python script can't change user agent and proxies on Chrome

Python script can't change user agent and proxies on Chrome - python

Actually, i have a script who start a chrome session (not headless). I want to change and rotate user agent and proxies, with fake_useragent and proxy_randomizer libraries.
I've see on somes topics here how to make this, but not work on my chrome session.
I use chromedriver 98.0.4758.82 and chrome browser 98.0.4758.82, and i don't have any error message on terminal. On "userAgent" string and "proxf" string, user agent and proxy are correctly rotate, i think an error is in argument on chrome option, but i don't know.
Here is my script today:
from fake_useragent import UserAgent
from proxy_randomizer import RegisteredProviders
from proxy_randomizer.proxy import Anonymity
import requests
# URL
url = "https://www.carrefour.fr"
warnings.filterwarnings("ignore")
# Proxies
rp = RegisteredProviders()
rp.parse_providers()
#print(f"proxy: {rp.get_random_proxy()}")
for proxy in rp.proxies:
proxies = { "https": f"{proxy.ip_address}:{proxy.port}" }
#response = requests.get("http://google.com", proxies=proxies)
prox = rp.get_random_proxy()
proxip = prox.ip_address
proxpo = prox.port
proxf = "https://"+str(proxip)+":"+str(proxpo)
#print(prox)
#print(proxip)
#print(proxpo)
#print(proxf)
# Ouverture Chrome
options = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
#options.headless = True
driver = webdriver.Chrome(options=options)
options.add_argument(f'user-agent={userAgent}')
options.add_argument('--proxy-server=%s' % proxf)
driver.get(url)
driver.execute_script("return navigator.userAgent;")
Anyone have an idea why my user agent and proxies are not used on chrome session please? I think i have correct requests, but i don't know...
Thanks for help!
Bye

It's ok now, here is the correct code:
from fake_useragent import UserAgent
from proxy_randomizer import RegisteredProviders
from proxy_randomizer.proxy import Anonymity
import requests
from fresh_useragent import UserAgent
def getRandomUserAgent():
lines = open('UAStrings.txt').read().splitlines()
return random.choice(lines)
ua = getRandomUserAgent()
prox = rp.get_random_proxy()
proxl = "http://"+str(prox)
options.add_argument(f'user-agent={ua}')
options.add_argument(f'--proxy-server={proxl}')

Related

How Do I Monitor Network Flow with Selenium?

I am trying to scrape data from this url with Python-Selenium.
» https://shopee.co.id/PANCI-PRESTO-24cm-3.5L-TEFLON-i.323047288.19137193916?sp_atk=7e8e7abc-834c-4f4a-9234-19da9ddb2445&xptdk=7e8e7abc-834c-4f4a-9234-19da9ddb2445
If you watch the network stream you will see that it returns an api on the back end like this https://shopee.co.id/api/v4/item/get?itemid=19137193916&shopid=323047288. How can I get the response returned by this api with selenium?

Solved!
import json
import time
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Set up Selenium webdriver
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
options = webdriver.ChromeOptions()
options.binary_location = "/usr/bin/brave"
options.add_argument("--ignore-certificate-errors")
driver = webdriver.Chrome(desired_capabilities=capabilities, options=options)
# Navigate to URL and monitor network flow
url = "https://shopee.co.id/PANCI-PRESTO-24cm-3.5L-TEFLON-i.323047288.19137193916?sp_atk=7e8e7abc-834c-4f4a-9234-19da9ddb2445&xptdk=7e8e7abc-834c-4f4a-9234-19da9ddb2445"
driver.get(url)
time.sleep(3) # Wait for the page to load
# Find any API requests and print the returned data to the screen
logs = driver.get_log("performance")
for entry in logs:
message = entry.get("message", {})
parsed_message = json.loads(message)
message_dict = parsed_message.get("message", {})
method = message_dict.get("method")
if method == "Network.requestWillBeSent":
request = message_dict.get("params", {}).get("request", {})
url = request.get("url")
if "https://shopee.co.id/api/v4/item/get?itemid=19137193916&shopid=323047288" in url:
response_url = url.replace("request", "response")
response = driver.execute_cdp_cmd(
"Network.getResponseBody", {"requestId": message_dict.get("params", {}).get("requestId")}
)
with open("response.json", "w") as f:
f.write(response.get("body", ""))

I use selenium wire for this. You can do pip install selenium-wire to get it and then import it into your project and use it like so:
from seleniumwire import webdriver
#Sets the Option to disable response encoding
sw_options = {
'disable_encoding': True
}
#Creates driver with selected options
driver = webdriver.Chrome(seleniumwire_options=sw_options)
#Starts selenium wire interceptor to monitor network traffic
driver.request_interceptor = interceptor
#Navigate to page
driver.get('https://shopee.co.id/PANCI-PRESTO-24cm-3.5L-TEFLON-i.323047288.19137193916?sp_atk=7e8e7abc-834c-4f4a-9234-19da9ddb2445&xptdk=7e8e7abc-834c-4f4a-9234-19da9ddb2445')
#Iterate through requests and find the one with the endpoint you need in the url
for a in driver.requests:
if("/api/v4/item/get?itemid=19137193916&shopid=323047288" in a.url):
body = a.response.body
print(body)
We add disable encoding to the options otherwise the body would come back encoded and youd have to decode it manually which can be done like so
body = decode(response.body, response.headers.get('Content-Encoding', 'identity'))
Or done in the browser options as I did.
You can find more information here:
https://pypi.org/project/selenium-wire/#response-objects

How to get Newtwork status code from FETCH/XHR in selenium python

I want to fetch status code from network FETCH/XHR
i want to get Status Code: 200 from response , so can i get this using selenium python
i try with
from selenium import webdriver
import os
# for LOcal
dpath = os.getcwd()+"/"+'chromedriver'
# create webdriver object
driver = (executable_path=dpath,options=options)
url = "https://pizzaonline.dominos.co.in/cart"
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}
# get geeksforgeeks.org
driver.get("https://www.geeksforgeeks.org/")
# get browser log
logs = driver.get_log("browser")
but not work

You can use requests package within python:
import requests
url = 'your url'
status_code = requests.get(url).status_code
print(status_code) #This will just print the status code of the url
#do your stuff here

Scrape Instagram names with BeautifulSoup in Python

I'm trying to make Instagram scraper with BeautifulSoup. I just want to get the name of the profile. (I'm using Jennifer Lopez profile)
This is the code that I have:
import requests
from bs4 import BeautifulSoup
instagram_url = "https://www.instagram.com"
username = "jlo"
profile = instagram_url + "/" + username
response = requests.get(profile)
print(response.text)
if response.ok:
html = response.text
bs_html = BeautifulSoup(html)
name = bs_html('#react-root > section > main > div > header > section > div.-vDIg > h1')
print(name) #this should be Jennifer Lopez
Code works until print(response.text) and it has error in if statement
This is the warning that I get:
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml").
And I do not get the name.
Do you know what's the problem? I have also tried this. To download page, and in that way I have used .find option and it works amazing (it works for every profile), but when I try to do this with link, it does not work.
Is there a way to do this without using Selenium?
from urllib.request import urlopen
from bs4 import BeautifulSoup
#this works
with open('Jennifer.html', encoding = 'utf-8') as html:
bs = BeautifulSoup(html, 'lxml')
name = bs.find('h1', class_='rhpdm')
name = str(name).split(">")[1].split("<")[0]
print(name)
#this does not work
html = urlopen('https://www.instagram.com/jlo/')
bs = BeautifulSoup(html, 'lxml')
name = bs.find('h1', class_='rhpdm')
print(name)

Scripts using selenium Chrome driver.
You can download compatible chrome driver from this link Check your chrome web browser version and download the compatible chrome driver version from above link.
from bs4 import BeautifulSoup
from selenium import webdriver
instagram_url = "https://www.instagram.com"
username = "jlo"
profile = instagram_url + "/" + username
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
driver=webdriver.Chrome('D:\chromedriver.exe',chrome_options=chrome_options)
driver.get(profile)
html=driver.page_source
driver.close()
soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('.rhpdm').text)

Here you go! You can do it like this.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
binary = r'C:\Program Files\Mozilla Firefox\firefox.exe' #this should be same if using windows
options = Options()
options.set_headless(headless=True)
options.binary = binary
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True #optional
driver = webdriver.Firefox(firefox_options=options, capabilities=cap, executable_path=r'Your Path') #put your geckodriver path here
#Above code should be the same for most of the time when you scrape.
#Below is the place where you will be making changes
instagram_url = "https://www.instagram.com"
username = "jlo"
profile = instagram_url + "/" + username
driver.get(profile)
soup=BeautifulSoup(driver.page_source)
for x in soup.findAll('h1',{'class':'rhpdm'}):
print(x.text.strip())
driver.quit()
Instructions for downloading geckodriver is here

Python web scraping Zacks website error: [WinError 10054] An existing connection was forcibly closed by the remote host

I would like to get the data located on this page:
https://www.zacks.com/stock/quote/MA
I've tried to do this with Beautiful Soup in Python but I get an error: "[WinError 10054] An existing connection was forcibly closed by the remote host".
Can someone guide me?
from bs4 import BeautifulSoup
import urllib
import re
import urllib.request
url = 'https://www.zacks.com/stock/quote/MA'
r = urllib.request.urlopen(url).read()
soup = BeautifulSoup(r, "lxml")
soup
Thanks!

The website is blocking your request, maybe the host allowed no requests without a request header. You can try to simulate a "real" request with the Selenium package.
This is working:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
options = Options()
options.set_headless(headless=True)
url = 'https://www.zacks.com/stock/quote/MA'
browser = webdriver.Firefox(firefox_options=options)
browser.get(url)
html_source = browser.page_source
soup = BeautifulSoup(html_source, "lxml")
print(soup)
browser.close()

Your page is blocking the user-agent python, the user agent is basically "who is doing the request" install the python module fake user-agent and add a header to the request simulating that the request is being made for another one like google chrome, mozilla, etc if you want an specific user-agent i recomend you look at fake-user-agent
With urllib i don't know how you add a header (probably will be with a flag) but i let you here a simple code using the module requests:
import requests
from fake_useragent import UserAgent
ua = UserAgent()
header = {
"User-Agent": ua.random
}
r = requests.get('https://www.zacks.com/stock/quote/MA', headers=header)
r.text #your html code
After this you can use beatifull soup with r.text like you did:
soup = BeautifulSoup(r.text, "lxml")
soup
EDIT:
Looking a bit if you want do it with urllib you can do this:
import urllib
from fake_useragent import UserAgent
ua = UserAgent()
q = urllib.Request('https://www.zacks.com/stock/quote/MA')
q.add_header('User-Agent', ua.random)
a = urlopen(q).read()

Taken from this answer here:
It's fatal. The remote server has sent you a RST packet, which
indicates an immediate dropping of the connection, rather than the
usual handshake. This bypasses the normal half-closed state
transition. I like this description:
"Connection reset by peer" is the TCP/IP equivalent of slamming the
phone back on the hook. It's more polite than merely not replying,
leaving one hanging. But it's not the FIN-ACK expected of the truly
polite TCP/IP converseur."
This is because the User-Agent defined when making the Python Requests is not accepted by the queried site and hence the connection was dropped by the remote web server. Hence the connection reset error that you see. I tried doing a cURL request and it worked fine, so all you have to do is define your User-Agent in the header section. Something like this:
>>> header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0',}
>>> url = 'https://www.zacks.com/stock/quote/MA'
>>> r = requests.get(url, headers=header, verify=False)
>>> soups = BS(r.text,"lxml")
>>> print(soups.prettify())
And then make the required get requests and I'm hoping you'll be good.

Login to website with Python and Selenium and subprocess problem [duplicate]

Does anybody know if Selenium (WebDriver preferably) is able to communicate with and act through a browser that is already running before launching a Selenium Client?
I mean if Selenium is able to comunicate with a browser without using the Selenium Server (with could be an Internet Explorer launched manually for example).

This is a duplicate answer
**Reconnect to a driver in python selenium ** This is applicable on all drivers and for java api.
open a driver
driver = webdriver.Firefox() #python
extract to session_id and _url from driver object.
url = driver.command_executor._url #"http://127.0.0.1:60622/hub"
session_id = driver.session_id #'4e167f26-dc1d-4f51-a207-f761eaf73c31'
Use these two parameter to connect to your driver.
driver = webdriver.Remote(command_executor=url,desired_capabilities={})
driver.close() # this prevents the dummy browser
driver.session_id = session_id
And you are connected to your driver again.
driver.get("http://www.mrsmart.in")

This is a pretty old feature request: Allow webdriver to attach to a running browser . So it's officially not supported.
However, there is some working code which claims to support this: https://web.archive.org/web/20171214043703/http://tarunlalwani.com/post/reusing-existing-browser-session-selenium-java/.

This snippet successfully allows to reuse existing browser instance yet avoiding raising the duplicate browser. Found at Tarun Lalwani's blog.
from selenium import webdriver
from selenium.webdriver.remote.webdriver import WebDriver
# executor_url = driver.command_executor._url
# session_id = driver.session_id
def attach_to_session(executor_url, session_id):
original_execute = WebDriver.execute
def new_command_execute(self, command, params=None):
if command == "newSession":
# Mock the response
return {'success': 0, 'value': None, 'sessionId': session_id}
else:
return original_execute(self, command, params)
# Patch the function before creating the driver object
WebDriver.execute = new_command_execute
driver = webdriver.Remote(command_executor=executor_url, desired_capabilities={})
driver.session_id = session_id
# Replace the patched function with original function
WebDriver.execute = original_execute
return driver
bro = attach_to_session('http://127.0.0.1:64092', '8de24f3bfbec01ba0d82a7946df1d1c3')
bro.get('http://ya.ru/')

It is possible. But you have to hack it a little, there is a code
What you have to do is to run stand alone server and "patch" RemoteWebDriver
public class CustomRemoteWebDriver : RemoteWebDriver
{
public static bool newSession;
public static string capPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TestFiles", "tmp", "sessionCap");
public static string sessiodIdPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TestFiles", "tmp", "sessionid");
public CustomRemoteWebDriver(Uri remoteAddress)
: base(remoteAddress, new DesiredCapabilities())
{
}
protected override Response Execute(DriverCommand driverCommandToExecute, Dictionary<string, object> parameters)
{
if (driverCommandToExecute == DriverCommand.NewSession)
{
if (!newSession)
{
var capText = File.ReadAllText(capPath);
var sidText = File.ReadAllText(sessiodIdPath);
var cap = JsonConvert.DeserializeObject<Dictionary<string, object>>(capText);
return new Response
{
SessionId = sidText,
Value = cap
};
}
else
{
var response = base.Execute(driverCommandToExecute, parameters);
var dictionary = (Dictionary<string, object>) response.Value;
File.WriteAllText(capPath, JsonConvert.SerializeObject(dictionary));
File.WriteAllText(sessiodIdPath, response.SessionId);
return response;
}
}
else
{
var response = base.Execute(driverCommandToExecute, parameters);
return response;
}
}
}

From here, if the browser was manually opened, then remote debugging can be used:
Start chrome with
chrome --remote-debugging-port=9222
Or with optional profile
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenium\ChromeProfile"
Then:
Java:
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
//Change chrome driver path accordingly
System.setProperty("webdriver.chrome.driver", "C:\\selenium\\chromedriver.exe");
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("debuggerAddress", "127.0.0.1:9222");
WebDriver driver = new ChromeDriver(options);
System.out.println(driver.getTitle());
Python:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
#Change chrome driver path accordingly
chrome_driver = "C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver, chrome_options=chrome_options)
print driver.title

Inspired by Eric's answer, here is my solution to this problem for selenium 3.7.0. Compared with the solution at http://tarunlalwani.com/post/reusing-existing-browser-session-selenium/, the advantage is that there won't be a blank browser window each time I connect to the existing session.
import warnings
from selenium.common.exceptions import WebDriverException
from selenium.webdriver.remote.errorhandler import ErrorHandler
from selenium.webdriver.remote.file_detector import LocalFileDetector
from selenium.webdriver.remote.mobile import Mobile
from selenium.webdriver.remote.remote_connection import RemoteConnection
from selenium.webdriver.remote.switch_to import SwitchTo
from selenium.webdriver.remote.webdriver import WebDriver
# This webdriver can directly attach to an existing session.
class AttachableWebDriver(WebDriver):
def __init__(self, command_executor='http://127.0.0.1:4444/wd/hub',
desired_capabilities=None, browser_profile=None, proxy=None,
keep_alive=False, file_detector=None, session_id=None):
"""
Create a new driver that will issue commands using the wire protocol.
:Args:
- command_executor - Either a string representing URL of the remote server or a custom
remote_connection.RemoteConnection object. Defaults to 'http://127.0.0.1:4444/wd/hub'.
- desired_capabilities - A dictionary of capabilities to request when
starting the browser session. Required parameter.
- browser_profile - A selenium.webdriver.firefox.firefox_profile.FirefoxProfile object.
Only used if Firefox is requested. Optional.
- proxy - A selenium.webdriver.common.proxy.Proxy object. The browser session will
be started with given proxy settings, if possible. Optional.
- keep_alive - Whether to configure remote_connection.RemoteConnection to use
HTTP keep-alive. Defaults to False.
- file_detector - Pass custom file detector object during instantiation. If None,
then default LocalFileDetector() will be used.
"""
if desired_capabilities is None:
raise WebDriverException("Desired Capabilities can't be None")
if not isinstance(desired_capabilities, dict):
raise WebDriverException("Desired Capabilities must be a dictionary")
if proxy is not None:
warnings.warn("Please use FirefoxOptions to set proxy",
DeprecationWarning)
proxy.add_to_capabilities(desired_capabilities)
self.command_executor = command_executor
if type(self.command_executor) is bytes or isinstance(self.command_executor, str):
self.command_executor = RemoteConnection(command_executor, keep_alive=keep_alive)
self.command_executor._commands['GET_SESSION'] = ('GET', '/session/$sessionId') # added
self._is_remote = True
self.session_id = session_id # added
self.capabilities = {}
self.error_handler = ErrorHandler()
self.start_client()
if browser_profile is not None:
warnings.warn("Please use FirefoxOptions to set browser profile",
DeprecationWarning)
if session_id:
self.connect_to_session(desired_capabilities) # added
else:
self.start_session(desired_capabilities, browser_profile)
self._switch_to = SwitchTo(self)
self._mobile = Mobile(self)
self.file_detector = file_detector or LocalFileDetector()
self.w3c = True # added hardcoded
def connect_to_session(self, desired_capabilities):
response = self.execute('GET_SESSION', {
'desiredCapabilities': desired_capabilities,
'sessionId': self.session_id,
})
# self.session_id = response['sessionId']
self.capabilities = response['value']
To use it:
if use_existing_session:
browser = AttachableWebDriver(command_executor=('http://%s:4444/wd/hub' % ip),
desired_capabilities=(DesiredCapabilities.INTERNETEXPLORER),
session_id=session_id)
self.logger.info("Using existing browser with session id {}".format(session_id))
else:
browser = AttachableWebDriver(command_executor=('http://%s:4444/wd/hub' % ip),
desired_capabilities=(DesiredCapabilities.INTERNETEXPLORER))
self.logger.info('New session_id : {}'.format(browser.session_id))

It appears that this feature is not officially supported by selenium. But, Tarun Lalwani has created working Java code to provide the feature. Refer - http://tarunlalwani.com/post/reusing-existing-browser-session-selenium-java/
Here is the working sample code, copied from the above link:
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.remote.*;
import org.openqa.selenium.remote.http.W3CHttpCommandCodec;
import org.openqa.selenium.remote.http.W3CHttpResponseCodec;
import java.io.IOException;
import java.lang.reflect.Field;
import java.net.URL;
import java.util.Collections;
public class TestClass {
public static RemoteWebDriver createDriverFromSession(final SessionId sessionId, URL command_executor){
CommandExecutor executor = new HttpCommandExecutor(command_executor) {
#Override
public Response execute(Command command) throws IOException {
Response response = null;
if (command.getName() == "newSession") {
response = new Response();
response.setSessionId(sessionId.toString());
response.setStatus(0);
response.setValue(Collections.<String, String>emptyMap());
try {
Field commandCodec = null;
commandCodec = this.getClass().getSuperclass().getDeclaredField("commandCodec");
commandCodec.setAccessible(true);
commandCodec.set(this, new W3CHttpCommandCodec());
Field responseCodec = null;
responseCodec = this.getClass().getSuperclass().getDeclaredField("responseCodec");
responseCodec.setAccessible(true);
responseCodec.set(this, new W3CHttpResponseCodec());
} catch (NoSuchFieldException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
} else {
response = super.execute(command);
}
return response;
}
};
return new RemoteWebDriver(executor, new DesiredCapabilities());
}
public static void main(String [] args) {
ChromeDriver driver = new ChromeDriver();
HttpCommandExecutor executor = (HttpCommandExecutor) driver.getCommandExecutor();
URL url = executor.getAddressOfRemoteServer();
SessionId session_id = driver.getSessionId();
RemoteWebDriver driver2 = createDriverFromSession(session_id, url);
driver2.get("http://tarunlalwani.com");
}
}
Your test needs to have a RemoteWebDriver created from an existing browser session. To create that Driver, you only need to know the "session info", i.e. address of the server (local in our case) where the browser is running and the browser session id. To get these details, we can create one browser session with selenium, open the desired page, and then finally run the actual test script.
I don't know if there is a way to get session info for a session which was not created by selenium.
Here is an example of session info:
Address of remote server : http://localhost:24266. The port number is different for each session.
Session Id : 534c7b561aacdd6dc319f60fed27d9d6.

All the solutions so far were lacking of certain functionality.
Here is my solution:
public class AttachedWebDriver extends RemoteWebDriver {
public AttachedWebDriver(URL url, String sessionId) {
super();
setSessionId(sessionId);
setCommandExecutor(new HttpCommandExecutor(url) {
#Override
public Response execute(Command command) throws IOException {
if (command.getName() != "newSession") {
return super.execute(command);
}
return super.execute(new Command(getSessionId(), "getCapabilities"));
}
});
startSession(new DesiredCapabilities());
}
}

Javascript solution:
I have successfully attached to existing browser session using this function
webdriver.WebDriver.attachToSession(executor, session_id);
Documentation can be found here.

I got a solution in python, I modified the webdriver class bassed on PersistenBrowser class that I found.
https://github.com/axelPalmerin/personal/commit/fabddb38a39f378aa113b0cb8d33391d5f91dca5
replace the webdriver module /usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py
Ej. to use:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
runDriver = sys.argv[1]
sessionId = sys.argv[2]
def setBrowser():
if eval(runDriver):
webdriver = w.Remote(command_executor='http://localhost:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME,
)
else:
webdriver = w.Remote(command_executor='http://localhost:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME,
session_id=sessionId)
url = webdriver.command_executor._url
session_id = webdriver.session_id
print url
print session_id
return webdriver

Use Chrome's built in remote debugging. Launch Chrome with remote debugging port open. I did this on OS X:
sudo nohup /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 &
Tell Selenium to use the remote debugging port:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome("./chromedriver", chrome_options=options)

I'm using Rails + Cucumber + Selenium Webdriver + PhantomJS, and I've been using a monkey-patched version of Selenium Webdriver, which keeps PhantomJS browser open between test runs. See this blog post: http://blog.sharetribe.com/2014/04/07/faster-cucumber-startup-keep-phantomjs-browser-open-between-tests/
See also my answer to this post: How do I execute a command on already opened browser from a ruby file

Solution using Python programming language.
from selenium import webdriver
from selenium.webdriver.remote.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
executor_url = "http://localhost:4444/wd/hub"
# Create a desired capabilities object as a starting point.
capabilities = DesiredCapabilities.FIREFOX.copy()
capabilities['platform'] = "WINDOWS"
capabilities['version'] = "10"
# ------------------------ STEP 1 --------------------------------------------------
# driver1 = webdriver.Firefox()
driver1 = webdriver.Remote(command_executor=executor_url, desired_capabilities=capabilities)
driver1.get('http://google.com/')
url = driver1.command_executor._url
print(driver1.command_executor._url)
print(driver1.session_id)
print(driver1.title)
# Serialize the session id in a file
session_id = driver1.session_id
# ------------------ END OF STEP 1 --------------------------------------------------
# Pass the session id from step 1 to step 2
# ------------------------ STEP 2 --------------------------------------------------
def attach_to_session(executor_url, session_id):
original_execute = WebDriver.execute
def new_command_execute(self, command, params=None):
if command == "newSession":
# Mock the response
return {'success': 0, 'value': None, 'sessionId': session_id}
else:
return original_execute(self, command, params)
# Patch the function before creating the driver object
WebDriver.execute = new_command_execute
temp_driver = webdriver.Remote(command_executor=executor_url)
# Replace the patched function with original function
WebDriver.execute = original_execute
return temp_driver
# read the session id from the file
driver2 = attach_to_session(executor_url, existing_session_id)
driver2.get('http://msn.com/')
print(driver2.command_executor._url)
print(driver2.session_id)
print(driver2.title)
driver2.close()
# ------------------ END OF STEP 2 --------------------------------------------------

After trying most of these solutions, this solution has worked for me the best. Thanks to #Ahmed_Ashour.
For those who are struggling with this problem, here are a few tips to make your life a bit easier:
1- use a driver manager instead of a manually installed driver (to avoid compatibility issues)
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
2- Make sure to close the running chrome instance before starting the new one with the debugging port
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenum\ChromeProfile"

This is pretty easy using the JavaScript selenium-webdriver client:
First, make sure you have a WebDriver server running. For example, download ChromeDriver, then run chromedriver --port=9515.
Second, create the driver like this:
var driver = new webdriver.Builder()
.withCapabilities(webdriver.Capabilities.chrome())
.usingServer('http://localhost:9515') // <- this
.build();
Here's a complete example:
var webdriver = require('selenium-webdriver');
var driver = new webdriver.Builder()
.withCapabilities(webdriver.Capabilities.chrome())
.usingServer('http://localhost:9515')
.build();
driver.get('http://www.google.com');
driver.findElement(webdriver.By.name('q')).sendKeys('webdriver');
driver.findElement(webdriver.By.name('btnG')).click();
driver.getTitle().then(function(title) {
console.log(title);
});
driver.quit();

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python script can't change user agent and proxies on Chrome - python

Related

How Do I Monitor Network Flow with Selenium?

How to get Newtwork status code from FETCH/XHR in selenium python

Scrape Instagram names with BeautifulSoup in Python

Python web scraping Zacks website error: [WinError 10054] An existing connection was forcibly closed by the remote host

Login to website with Python and Selenium and subprocess problem [duplicate]

Categories

Resources