I've made a script using selenium and pytest to scrape data from a site, however the particular site blocks connections when two browsers are accessing the site from the same IP address, making it difficult for me to run several tests in parallel.
I want to have a different proxy for each instance of the test being run but have not found any worthwhile explanations or answers to my question after some looking.
Eg. run tests 1 and 2 in parallel, both on the same site, but with a differing proxy for test 1 and 2.
I have yet to make a working solution to my problem, and after looking online have found ways to rotate the proxy after x number of seconds, x number of tests, etc, but no way to have a separate proxy for each individual test I'm running in parallel. I have attached my most recent attempt below, along with the original conftest code that I'm using for my tests.
Original:
#pytest.fixture(scope="session")
def setup(request):
print("Initiating chrome driver...")
session = request.node
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-web-security")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument('--log-level=1')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36')
s = Service('C:/Users/44732/Desktop/Python/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=chrome_options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
for item in session.items:
cls = item.getparent(pytest.Class)
setattr(cls.obj, "driver", driver)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
yield driver
driver.quit()
Most recent attempt:
#pytest.fixture(scope="session")
def setup(request):
if __name__ == '__main__':
print("Initiating chrome driver...")
with open("proxies.txt") as f:
for i, line in enumerate(f):
line = line.strip()
if len(line) > 0:
data, _ = line.split('#')
ip, port = data.split('+')
ip = ip.strip()
port = port.strip()
PROXY = ip + port
session = request.node
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-web-security")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument('--log-level=1')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument('--proxy-server=%s' % PROXY)
chrome_options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36')
s = Service('C:/Users/44732/Desktop/Python/chromedriver.exe')
driver = webdriver.Chrome(service=s, options=chrome_options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
for item in session.items:
cls = item.getparent(pytest.Class)
setattr(cls.obj, "driver", driver)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
yield driver
driver.quit()
Related
So, I'm having a problem selecting a certificate with Selenium in Python. I've tried to select accepting it as if it were an alert, but without success. Could someone help me?
Example.
Here's an example of my driver settings.
options = Options()
if self.invisivel:
options.add_argument('headless')
options.add_argument('--log-level=3')
prefs = {"download.default_directory": self.diretorio}
options.add_experimental_option("prefs", prefs)
self.dc = DesiredCapabilities.CHROME
self.dc['goog:loggingPrefs'] = {'browser': 'ALL'}
options.add_argument("--window-size=1920,1200")
self.driver = webdriver.Chrome(ChromeDriverManager().install(), options=options,
desired_capabilities=self.dc)
stealth(self.driver,
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36',
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
You are trying to accept the allert, right?
This should work:
driver.switch_to.alert.accept()
i need Scraping website, but display "Checking your browser before accessing" and Prevents access to the site
Do I have to define a cookie or is there another solution?
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--window-size=1920,1080")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0")
mainbrowser = webdriver.Chrome(chrome_options=options)
mainbrowser.get('https://trade.kraken.com/charts/KRAKEN:BTC-USDT')
sleep(20)
I have used the following options recently to avoid captcha detection on certain sites:
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("./chrome_data") # Chrome Profile data (moved from ~/Library/Application Support/Google/Chrome)
options.add_argument("--user-data-dir=chrome-data")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
Furthermore I've made use of the library selenium-stealth (https://pypi.org/project/selenium-stealth/) which has incorporated many of the techniques used to avoid detection into a package:
driver = webdriver.Chrome(options=options)
stealth(
driver,
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36',
languages = ["en-US", "en"],
vendor = "Google Inc.",
platform = "Win32",
webgl_vendor = "Intel Inc.",
renderer = "Intel Iris OpenGL Engine",
fix_hairline = True,
run_on_insecure_origins = True)
I can't disable chromedriver logging message "DevTools listening on ws:......" in cmd, i've used some methods like,
options.add_argument("log-level=3")
options.add_argument("disable-logging")
options.add_experimental_option("excludeSwitches", ["enable-logging"])
but the message "DevTools listening on ws:....." still appear in the cmd.
My code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("log-level=3")
options.add_argument("start-maximized")
options.add_argument("disable-logging")
options.add_experimental_option("excludeSwitches", ["enable-logging"])
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
options.add_experimental_option("prefs", {"credentials_enable_service": False, "profile.password_manager_enabled": False})
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()), service_log_path = "NUL", options = options)
browser.set_window_size(360, 720)
browser.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
browser.execute_cdp_cmd("Network.setUserAgentOverride", {"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36"})
browser.implicitly_wait(5)
browser.get("https://www.instagram.com/")
sleep(5)
maybe a little late but this worked for me
import logging
logger = logging.getLogger('urllib3.connectionpool')
logger.setLevel(logging.INFO)
logger = logging.getLogger('selenium.webdriver.remote.remote_connection')
logger.setLevel(logging.WARNING)
Hope this help
After looking for information in the community, I have seen in a post that the next code worked until some days ago:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
browser=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
browser.execute_cdp_cmd('Network.setUserAgentOverride',
{"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4240.198 Safari/537.36'})
browser.get('https://www.bet365.com')
After that, the next worked as a solution:
Open the file chromedriver.exe with Notepad ++ and searched and replaced "cdc_" with "xyz_" and saved the file. And add this line to the options of the chromedriver: options.add_argument('--disable-blink-features=AutomationControlled')
I don't know why this don't work for me. I am using Chrome 88.0.4324.146 and the chromedriver version 88.0.4324.96, and executing this code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("window-size=1920,1080")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
browser=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
browser.execute_cdp_cmd('Network.setUserAgentOverride',
{"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4240.198 Safari/537.36'})
browser.get('https://www.bet365.com')
But after executing the page gets stuck loading until it crash.
import subprocess
#other imports
subprocess.Popen(
'"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe" --remote-debugging-port=9222', shell=True)
options = webdriver.ChromeOptions()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(options=options)
driver.maximize_window()
driver.get('https://www.bet365.com')
It seems that the site detects the automation some how , work around is to open chrome using debug address and then connect selenium to this using above code . Change the chrome.exe according to your environment
Note: Make sure you close all the chrome browsers before running this script
This is my code:
chrome_options = Options()
WINDOW_SIZE = "1920,1080"
path_profile = "C:\\Users\\xxxx\\AppData\\Local\\Google\\Chrome\\User Data"
chrome_options.add_argument("--user-data-dir="+path_profile)
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument('user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--ignore-ssl-errors')
chrome_options.headless = True
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options = chrome_options,executable_path=xxxxx))
driver.get('https://www.youtube.com/upload')
time.sleep(10)
driver.save_screenshot(dirname(abspath(__file__))+'/screen_shot.png')
driver.close()
In my profile, I have extension ads blocker and cookies login youtube
But when I screenshot,
I realize selenium has not used the profile yet. Is there a way to do this? Thanks
Set path_profile value to the full path of your profile, add your profile name at the end, e.g. Profile 1:
path_profile = "C:\\Users\\xxxx\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 1"