The website redirect me to a captcha page (which is fine) but doesn't let me complete the captcha, sending a 403 response which is blocking the load of the captcha widget so I cannot send it to 2captcha workers. Tried VPN, tried switching network to my friend's house and I still get blocked. Is there any error in the code? Could be the Chromium version (Chromium 104.0.5112.79 snap) ?
from selenium import webdriver
from selenium_stealth import stealth
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path="/snap/chromium/2051/usr/lib/chromium-browser/chromedriver")
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
url = "https://www.ticketmaster.de/event/nfl-munich-game-seattle-seahawks-tampa-bay-buccaneers-tickets/467425?language=en-us"
driver.get(url)
time.sleep(5)
driver.quit()
Option 1: you should try to clear you cookies, you are probably fallen in a black list.
Option 2: the website detect selenium, in that case you can go to this question : Can a website detect when you are using Selenium with chromedriver?
Not that super clear why the website redirect you to a reCAPTCHA page. However with almost similar configuration using chrome=104.0 and chromedriver=104.0 , I can access the page perfecto.
Code block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
import time
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get('https://www.ticketmaster.de/event/nfl-munich-game-seattle-seahawks-tampa-bay-buccaneers-tickets/467425?language=en-us') # not detected
driver.save_screenshot("ticketmaster")
Screenshot:
However, the second time I try to access the same page I I face the Pardon the Interruption page:
which essentially implies the navigation is blocked.
References
You can find a relevant detailed discussion in:
Can a website detect when you are using Selenium with chromedriver?
Related
So, I'm trying to write a script to login on https://us.etrade.com/e/t/user/login
I am using Selenium for this but it somehow detects selenium when it starts and results in a message that says that the servers are crowded and when it happens, I can't log in. I've also tried using undetected-selenium as well as selenium-stealth but both got detected as well. I really need to automate this log in process. I've tried using python requests but that doesn't work. I'm open to any other technology or method that allows me to do this automation. Please help.
Here's my code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium_stealth import stealth
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--browser')
chrome_options.add_argument('--no-sandbox')
# chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
stealth(wd,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
wd.get("https://us.etrade.com/e/t/user/login")
Demo creds would have helped us to dig deeper into your specific usecase.
However using selenium-stealth I was able to bypass the detection of Selenium driven ChromeDriver initiated google-chrome Browsing Context pretty easily.
selenium4 compatible code
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
options = Options()
options.add_argument("start-maximized")
# Chrome is controlled by automated test software
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
driver.save_screenshot('bot_sannysoft.png')
Screenshot:
With ETRADE Login page
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
import time
options = Options()
options.add_argument("start-maximized")
# Chrome is controlled by automated test software
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://us.etrade.com/e/t/user/login")
driver.save_screenshot('etrade_com_login.png')
Screenshot:
I am using selenium in python to do automation tasks, but it requires chrome profiles with special settings. So I am using
options.add_argument("--user-data-dir=path_to_chrome_file")
to load profiled chrome. I am also using following options:
options.add_argument("start-maximized")
options.add_argument("--disable-gpu")
options.add_argument("--disable-web-security")
options.add_experimental_option("detach", True)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
ignored_exceptions = (
NoSuchElementException,
StaleElementReferenceException,
)
driver = webdriver.Chrome(options=options, executable_path="chromedriver")
stealth(
driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
Some of options may not necessary, just because added along time.
Problem:
I found that, sometimes (around 1/20 times), the file setting present in "--user-data-dir=path_to_chrome_file" will be cleared and generate the new data file. Can someone help figure out why is this happens?
The same code struture and options works well in windows.
I read that Selenium Chrome can run faster if you use implicit waits, headless, ID and CSS selectors etc. Before implementing those changes, I want to know whether cookies or caching could be slowing me down.
Does Selenium store cookies and cache like a normal browser or does it reload all assets everytime it navigates to a new page on a website?
If yes, then this would slow down the process of scraping websites with millions of identical profile pages, where the scripts and images are similar for each profile.
Is yes, is there a way to avoid this problem? Interested in using cookies and cache during a session and then destroying after the browser is closed.
Edit, more details:
sel_options = {'proxy': {'https': pString}}
prefs = {'download.default_directory' : dFolder}
options.add_experimental_option('prefs', prefs)
blocker = os.path.join( os.getcwd(), "extension_iijehicfndmapfeoplkdpinnaicikehn")
options.add_argument(f"--load-extension={blocker}")
wS = "--window-size="+s1+","+s2
options.add_argument(wS)
if headless == "yes": options.add_argument("--headless");
driver = uc.Chrome(seleniumwire_options=sel_options, options=options, use_subprocess=True, version_main=109)
stealth(driver, languages=["en-US", "en"], vendor="Google Inc.", platform="Win32", webgl_vendor="Intel Inc.", renderer="Intel Iris OpenGL Engine", fix_hairline=True)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": agent})
navigate("https://linkedin.com")
I don't think my proxy or extension is the culprit, because I have a similar automation app running with no speed issue.
it will automatically store cookies and cache assets just like a normal browser would. This can slow down the process of scraping websites with a large number of similar pages, as the assets and scripts will be reloaded every time a new page is navigated to.
ne solution is to use a separate instance of the WebDriver for each session, and explicitly delete the cookies and cache at the end of each session.
here is the code of example
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--disable-extensions")
options.add_argument("--disable-gpu")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_argument("--headless")
options.add_argument("--disable-features=VizDisplayCompositor")
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-logging")
options.add_argument("--disable-setuid-sandbox")
options.add_argument("--disable-seccomp-filter-sandbox")
driver = webdriver.Chrome(options=options)
# Your scraping code here...
driver.delete_all_cookies()
driver.execute_script("window.sessionStorage.clear();")
driver.execute_script("window.localStorage.clear();")
driver.close()
the delete_all_cookies method is used to delete all cookies in the current session, and the execute_script method is used to clear the session
I try to open the following site using selenium:
https://www.honestdoor.com/
Normally this works fine with every site with the following code:
(I am currently using google-chrome version 98.0.4758 - using ChromeDriverManager for downloading the version - see below in the code)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
import time
if __name__ == '__main__':
ua = UserAgent()
userAgent = ua.random
options = Options()
# options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
srv=Service(ChromeDriverManager().install())
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = "https://www.honestdoor.com/"
# link = "https://www.bcassessment.ca/"
# driver.minimize_window() # optional
driver.get (link)
time.sleep(1000)
The site opens with selenium as allways but then the site goes immediately complete white and is still loading forever with the cicle going around in the top left corner (I can only kill the chrome-task in the task manager).
When I open the site in normal chrome or incognito chrome everything works fine - it seem to only crash when I open it with selenium. With other sites (like https://www.bcassessment.ca/ I have no problems at all and the open with selenium as allways)
Why is this not working for this particular website?
Not that super clear about the exact issue you are facing while loading the website. However I was able to load the website using the following code block:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.honestdoor.com/")
Browser Snapshot:
So, I'm trying to write a script to login on https://us.etrade.com/e/t/user/login
I am using Selenium for this but it somehow detects selenium when it starts and results in a message that says that the servers are crowded and when it happens, I can't log in. I've also tried using undetected-selenium as well as selenium-stealth but both got detected as well. I really need to automate this log in process. I've tried using python requests but that doesn't work. I'm open to any other technology or method that allows me to do this automation. Please help.
Here's my code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium_stealth import stealth
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--browser')
chrome_options.add_argument('--no-sandbox')
# chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
stealth(wd,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
wd.get("https://us.etrade.com/e/t/user/login")
Demo creds would have helped us to dig deeper into your specific usecase.
However using selenium-stealth I was able to bypass the detection of Selenium driven ChromeDriver initiated google-chrome Browsing Context pretty easily.
selenium4 compatible code
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
options = Options()
options.add_argument("start-maximized")
# Chrome is controlled by automated test software
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
driver.save_screenshot('bot_sannysoft.png')
Screenshot:
With ETRADE Login page
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
import time
options = Options()
options.add_argument("start-maximized")
# Chrome is controlled by automated test software
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
# Selenium Stealth settings
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://us.etrade.com/e/t/user/login")
driver.save_screenshot('etrade_com_login.png')
Screenshot: