Selenium python can't scrape a site

Selenium python can't scrape a site - python

i need Scraping website, but display "Checking your browser before accessing" and Prevents access to the site
Do I have to define a cookie or is there another solution?
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--window-size=1920,1080")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0")
mainbrowser = webdriver.Chrome(chrome_options=options)
mainbrowser.get('https://trade.kraken.com/charts/KRAKEN:BTC-USDT')
sleep(20)

I have used the following options recently to avoid captcha detection on certain sites:
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("./chrome_data") # Chrome Profile data (moved from ~/Library/Application Support/Google/Chrome)
options.add_argument("--user-data-dir=chrome-data")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
Furthermore I've made use of the library selenium-stealth (https://pypi.org/project/selenium-stealth/) which has incorporated many of the techniques used to avoid detection into a package:
driver = webdriver.Chrome(options=options)
stealth(
driver,
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36',
languages = ["en-US", "en"],
vendor = "Google Inc.",
platform = "Win32",
webgl_vendor = "Intel Inc.",
renderer = "Intel Iris OpenGL Engine",
fix_hairline = True,
run_on_insecure_origins = True)

Related

How to select specific certificate with Selenium in python?

So, I'm having a problem selecting a certificate with Selenium in Python. I've tried to select accepting it as if it were an alert, but without success. Could someone help me?
Example.
Here's an example of my driver settings.
options = Options()
if self.invisivel:
options.add_argument('headless')
options.add_argument('--log-level=3')
prefs = {"download.default_directory": self.diretorio}
options.add_experimental_option("prefs", prefs)
self.dc = DesiredCapabilities.CHROME
self.dc['goog:loggingPrefs'] = {'browser': 'ALL'}
options.add_argument("--window-size=1920,1200")
self.driver = webdriver.Chrome(ChromeDriverManager().install(), options=options,
desired_capabilities=self.dc)
stealth(self.driver,
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36',
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)

You are trying to accept the allert, right?
This should work:
driver.switch_to.alert.accept()

Login into zed.run using selenium

Tried to login in zedrun using selenium but failed.
I want to use metamask extension there so i am using profiles
but the website catches me everytime.
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\code\\AppData\\Local\\Google\\Chrome\\User Data")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument(
"user-agent= Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36")
options.add_argument("cookie: cookie here")
driver=webdriver.Chrome(options=options)

last problem when scraping bet365.com with selenium

After looking for information in the community, I have seen in a post that the next code worked until some days ago:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
browser=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
browser.execute_cdp_cmd('Network.setUserAgentOverride',
{"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4240.198 Safari/537.36'})
browser.get('https://www.bet365.com')
After that, the next worked as a solution:
Open the file chromedriver.exe with Notepad ++ and searched and replaced "cdc_" with "xyz_" and saved the file. And add this line to the options of the chromedriver: options.add_argument('--disable-blink-features=AutomationControlled')
I don't know why this don't work for me. I am using Chrome 88.0.4324.146 and the chromedriver version 88.0.4324.96, and executing this code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("window-size=1920,1080")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
browser=webdriver.Chrome(options=options,executable_path=r"chromedriver.exe")
browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
browser.execute_cdp_cmd('Network.setUserAgentOverride',
{"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4240.198 Safari/537.36'})
browser.get('https://www.bet365.com')
But after executing the page gets stuck loading until it crash.

import subprocess
#other imports
subprocess.Popen(
'"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe" --remote-debugging-port=9222', shell=True)
options = webdriver.ChromeOptions()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(options=options)
driver.maximize_window()
driver.get('https://www.bet365.com')
It seems that the site detects the automation some how , work around is to open chrome using debug address and then connect selenium to this using above code . Change the chrome.exe according to your environment
Note: Make sure you close all the chrome browsers before running this script

Unable to access site with python selenium webdriver

I have been trying to build a universal scraper. But somehow there is some site that I am unable to access for some reason.
I have tried to use various options available on the internet to make sure that I avoided the bot detection flag but somehow the site apparently "detects" that I am a bot.
Here are the options I have been using.
```options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-data-dir=" + r'C:\Users\JEGSTUDIO\AppData\Local\Google\Chrome\selenium')
options.add_argument("window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)```
I see and compare the cookies, and it looks like this site is using Cloudflare js based on the cookie naming.
https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-
Here is the full code so you guys can try
```from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-data-dir=" + r'C:\Users\JEGSTUDIO\AppData\Local\Google\Chrome\selenium')
options.add_argument("window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\JEGSTUDIO\Gochi\Scraping Project\Scraper - AITOPIA v2\chromedriver88crack.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.get("https://google.com")
input('Next site')
driver.get("https://www.harrods.com/")
input('enter to quit')
driver.quit()```
Any clue would be appreciated

options.add_argument("--remote-debugging-port=9222")
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
driver = webdriver.Chrome(options=options)
driver.maximize_window()
driver.get("https://www.harrods.com/")
Adding remote debugging port makes the site works

selenium set profile for chrome headless python

This is my code:
chrome_options = Options()
WINDOW_SIZE = "1920,1080"
path_profile = "C:\\Users\\xxxx\\AppData\\Local\\Google\\Chrome\\User Data"
chrome_options.add_argument("--user-data-dir="+path_profile)
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.add_argument("--enable-javascript")
chrome_options.add_argument('user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--ignore-ssl-errors')
chrome_options.headless = True
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options = chrome_options,executable_path=xxxxx))
driver.get('https://www.youtube.com/upload')
time.sleep(10)
driver.save_screenshot(dirname(abspath(__file__))+'/screen_shot.png')
driver.close()
In my profile, I have extension ads blocker and cookies login youtube
But when I screenshot,
I realize selenium has not used the profile yet. Is there a way to do this? Thanks

Set path_profile value to the full path of your profile, add your profile name at the end, e.g. Profile 1:
path_profile = "C:\\Users\\xxxx\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 1"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium python can't scrape a site - python

Related

How to select specific certificate with Selenium in python?

Login into zed.run using selenium

last problem when scraping bet365.com with selenium

Unable to access site with python selenium webdriver

selenium set profile for chrome headless python

Categories

Resources