How to open a single tab in chromedriver with selenium - python

I'm using selenium in python 3.8 and write a code to control chromedriver.
The problem is there are too many tabs, almost 10 tabs, are opening when I call driver.get()
They are really useless tabs and I only want to open one single tab with a browser.
What's the problem with my code?
Is there any solution?
Here is my code :
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--log-level=3")
chrome_options.add_experimental_option('excludeSwitches',['enable-logging'])
chrome_options.add_argument("'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'")
driver = webdriver.Chrome("c:\chromedriver.exe", options=chrome_options)
driver.get('http://google.com')

Related

While I'm scraping with Selenium it keeps telling me that I'm an unusual browser and that I have to enable javascrept

I just started learning programming and started with scraping with python Selenium but when get the Url and send elemets the website keep sending me (Your browser is a bit unusual...
Try disabling ad blockers and other extensions, enabling javascript, or using a different web browser.)
I tried some of the solutions provided on the site, but none of them solved my problem.
Can you explain and solve the problem with python please?
import selenium
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('chromedriver.exe', options=options)
driver.set_window_size(620, 720)
driver.delete_all_cookies()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver.implicitly_wait(5)
options.add_argument("--headless")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
driver.get('https://sso.godaddy.com/v1/account/create?realm=idp&path=%2Fcontact%2Fvalidate%3FcontactType%3DphoneMobile%26app%3Dsso%26path%3Dprofile%252Fedit%26profileUpdate%3DTrue%26userInteraction%3DPROFILE_UPDATE&app=sso&auth_reason=1&iframe=false')

How is my Selenium script getting detected?

My simple Python script using Selenium is not working properly. My hypothesis is that it's getting detected and flaged as a bot. The only purpose of the script is to log in into zalando.pl website. No matter what I do, I get Error 403 ("Wystąpił błąd. Pracujemy nad jego usunięciem. Spróbuj ponownie później.").
I've tried various methods to resolve the problem. I've tried to simulate human behavior with sleep with random numbers (I've tried to use WebDriverWait as well). Also, I've been trying to solve the problem using options given to chromedriver, but it didn't help (I also edited string &cdc using hex editor). Exept all above, I tried undetected-chromedriver but it didn't help. Is there any way for my script to work?
Here's the code:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'/chromedriver.exe')
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.zalando.pl/login')
time.sleep(7)
username_entry = driver.find_element(By.XPATH, '//*[#id="login.email"]')
username_entry.send_keys("login#mail.com")
time.sleep(1)
password_entry = driver.find_element(By.XPATH, '//*[#id="login.secret"]')
password_entry.send_keys("password")
time.sleep(4)
button_entry = driver.find_element(By.XPATH, '//*[#id="sso"]/div/div[2]/main/div/div[2]/div/div/div/form/button/span')
time.sleep(2)
button_entry.click()

Headless selenium exits immediately

I have a headless web scraper. When it run the scraper takes a base url, scrapes the links on that page, and then scrapes the links it got off that page.
The problem I'm having is that when I run the scraper it pretty much immediately exits. When I run the scraper normally (non headless) it works perfectly fine.
These are my selenium arguments:
options = webdriver.ChromeOptions()
options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'),
options=options)
I've also tried adding these options but it gave me the same result:
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--window-size=1920,1080")
options.add_argument("--start-maximized")
How can I solve this? I'm trying to deploy this scraper to heroku and none of the things I've tried above worked.
Basically some website won't load in headless mode unless a user agent is specified.
To fix this I added:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
options.add_argument(f'user-agent={user_agent}')
This fixed the problem of my scraper exiting immediately

Getting blocked by a website with selenium and chromedriver

I'm having some trouble trying to access a web site (bet365.com) with a chrome driver and selenium (I'm quite being "blocked").
I can access the site with my ordinary chrome but when I try with chrome driver, it doesn't work.
I had this problem before and corrected it by using some options as below (python):
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'PATH_TO\chromedriver.exe')
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
driver.get("https://www.bet365.com/")
Now, the problem came back and this code is not working anymore to bypass the protection.
Can someone help me?
In case the Selenium driven ChromeDriver initiated google-chrome Browsing Context is getting detected a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.
undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
Code Block:
import undetected_chromedriver as uc
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
driver.get('https://bet365.com')
References
You can find a couple of relevant detailed discussions in:
Undetected Chromedriver not loading correctly

Why doesn't instagram work with Selenium headless Chrome?

I'm trying to build an insta bot that works headless, but it don't seem to find the username, password columns (i.e NoSuchElementException).
I tried to run this code to troubleshoot. (which basicaly opens the ig homepage and screenshots it)
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument("--window-size=1920,1080")
browser = webdriver.Chrome(options=options)
browser.get("https://www.instagram.com")
browser.get_screenshot_as_file(f"screenshot.png")
and i got these screenshots basically saying 'error, retry after several minutes' in french
I tried finding the 'connectez-vous' button thru selenium, but every xpath i try doesn't work, and it's impossible to find it thru f12
The bot will be later uploaded to pythonanywhere so i can run it in the cloud (so if you think i might run into some other problems you can let me know)
What do you suggest me to do?
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
#options.headless = True
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
browser = webdriver.Chrome(options=options)
browser.get("https://www.instagram.com")
sleep(5)
#browser.refresh()
browser.get_screenshot_as_file(f"screenshot.png")
For headless chrome , useragent is set as chromeheadless or something , this makes instagram to detect that you are using headless chrome.
You can vent this by specifying hardcoded useragent,
open a normal chrome , goto network tab , open request header and copy the user agent part and replace in your code
Headless browser detection

Categories