My simple Python script using Selenium is not working properly. My hypothesis is that it's getting detected and flaged as a bot. The only purpose of the script is to log in into zalando.pl website. No matter what I do, I get Error 403 ("Wystąpił błąd. Pracujemy nad jego usunięciem. Spróbuj ponownie później.").
I've tried various methods to resolve the problem. I've tried to simulate human behavior with sleep with random numbers (I've tried to use WebDriverWait as well). Also, I've been trying to solve the problem using options given to chromedriver, but it didn't help (I also edited string &cdc using hex editor). Exept all above, I tried undetected-chromedriver but it didn't help. Is there any way for my script to work?
Here's the code:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'/chromedriver.exe')
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.zalando.pl/login')
time.sleep(7)
username_entry = driver.find_element(By.XPATH, '//*[#id="login.email"]')
username_entry.send_keys("login#mail.com")
time.sleep(1)
password_entry = driver.find_element(By.XPATH, '//*[#id="login.secret"]')
password_entry.send_keys("password")
time.sleep(4)
button_entry = driver.find_element(By.XPATH, '//*[#id="sso"]/div/div[2]/main/div/div[2]/div/div/div/form/button/span')
time.sleep(2)
button_entry.click()
Related
I just started learning programming and started with scraping with python Selenium but when get the Url and send elemets the website keep sending me (Your browser is a bit unusual...
Try disabling ad blockers and other extensions, enabling javascript, or using a different web browser.)
I tried some of the solutions provided on the site, but none of them solved my problem.
Can you explain and solve the problem with python please?
import selenium
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('chromedriver.exe', options=options)
driver.set_window_size(620, 720)
driver.delete_all_cookies()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver.implicitly_wait(5)
options.add_argument("--headless")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
driver.get('https://sso.godaddy.com/v1/account/create?realm=idp&path=%2Fcontact%2Fvalidate%3FcontactType%3DphoneMobile%26app%3Dsso%26path%3Dprofile%252Fedit%26profileUpdate%3DTrue%26userInteraction%3DPROFILE_UPDATE&app=sso&auth_reason=1&iframe=false')
I have a headless web scraper. When it run the scraper takes a base url, scrapes the links on that page, and then scrapes the links it got off that page.
The problem I'm having is that when I run the scraper it pretty much immediately exits. When I run the scraper normally (non headless) it works perfectly fine.
These are my selenium arguments:
options = webdriver.ChromeOptions()
options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'),
options=options)
I've also tried adding these options but it gave me the same result:
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--window-size=1920,1080")
options.add_argument("--start-maximized")
How can I solve this? I'm trying to deploy this scraper to heroku and none of the things I've tried above worked.
Basically some website won't load in headless mode unless a user agent is specified.
To fix this I added:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
options.add_argument(f'user-agent={user_agent}')
This fixed the problem of my scraper exiting immediately
I'm having some trouble trying to access a web site (bet365.com) with a chrome driver and selenium (I'm quite being "blocked").
I can access the site with my ordinary chrome but when I try with chrome driver, it doesn't work.
I had this problem before and corrected it by using some options as below (python):
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'PATH_TO\chromedriver.exe')
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
driver.get("https://www.bet365.com/")
Now, the problem came back and this code is not working anymore to bypass the protection.
Can someone help me?
In case the Selenium driven ChromeDriver initiated google-chrome Browsing Context is getting detected a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.
undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
Code Block:
import undetected_chromedriver as uc
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
driver.get('https://bet365.com')
References
You can find a couple of relevant detailed discussions in:
Undetected Chromedriver not loading correctly
I'm trying to build an insta bot that works headless, but it don't seem to find the username, password columns (i.e NoSuchElementException).
I tried to run this code to troubleshoot. (which basicaly opens the ig homepage and screenshots it)
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument("--window-size=1920,1080")
browser = webdriver.Chrome(options=options)
browser.get("https://www.instagram.com")
browser.get_screenshot_as_file(f"screenshot.png")
and i got these screenshots basically saying 'error, retry after several minutes' in french
I tried finding the 'connectez-vous' button thru selenium, but every xpath i try doesn't work, and it's impossible to find it thru f12
The bot will be later uploaded to pythonanywhere so i can run it in the cloud (so if you think i might run into some other problems you can let me know)
What do you suggest me to do?
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
#options.headless = True
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
browser = webdriver.Chrome(options=options)
browser.get("https://www.instagram.com")
sleep(5)
#browser.refresh()
browser.get_screenshot_as_file(f"screenshot.png")
For headless chrome , useragent is set as chromeheadless or something , this makes instagram to detect that you are using headless chrome.
You can vent this by specifying hardcoded useragent,
open a normal chrome , goto network tab , open request header and copy the user agent part and replace in your code
Headless browser detection
Is there a way to log in to twitter with selenium/python without getting the notification "login from a new device" on Twitter?
Current code is
import urllib.request
import selenium
from bs4 import BeautifulSoup
import requests
import re
import json
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path='./chromedriver')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
def twitter_login(driver, username, password):
driver.get('https://twitter.com/login')
driver.find_element_by_xpath("//input[contains(#name,'username')]").send_keys(username)
driver.find_element_by_xpath("//input[contains(#name,'password')]").send_keys(password)
driver.find_elements_by_xpath("//*[contains(text(), 'Log in')]")[1].click()
twitter_login(driver, username, password)
You can start Selenium with a Chrome profile that is already familiar with your Twitter account. That way, you won't be bothered by any notifications.
options.add_argument(r"--user-data-dir=C:\Users\Me\AppData\Local\Google\Chrome\User Data") # Enter 'chrome://version' in the adress bar and paste in the Profile Path
options.add_argument(r"--profile-directory=Profile 4") # The profile name can be found in the last bit of the Profile Path