How to run a script in a headless mode - python

There is a script-parser site written in Python with Selenium, if I run it in headless mode, so as not to open the browser window, it can not find the desired item and spar the information from it. If I run it without headless mode, it works fine
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r"chromedriver.exe")
driver.get(f'''https://www.wildberries.ru/catalog/38862450/detail.aspx?targetUrl=SP''')
time.sleep(20)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(20)
element = driver.find_element(by=By.CLASS_NAME, value="address-rate-mini")
btn = driver.find_elements(by=By.CLASS_NAME, value="btn-base")
It can't find the btn I need

Browser Detection:
Some webpages can detect that you are scraping their site by looking at your user agent. A normal browsing agent looks like this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
However in headless mode, your browsing agent looks like this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36
Whatever site you are scraping probably detected the "Headless" tag and possibly restricting the btn element from you.
Solution:
A simple solution to this would be to add the following Selenium option to programmatically change your user agent:
options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"')
More Information:
Python selenium headless mode missing elements

Related

Login into zed.run using selenium

Tried to login in zedrun using selenium but failed.
I want to use metamask extension there so i am using profiles
but the website catches me everytime.
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\code\\AppData\\Local\\Google\\Chrome\\User Data")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument(
"user-agent= Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36")
options.add_argument("cookie: cookie here")
driver=webdriver.Chrome(options=options)

Unable to access site with python selenium webdriver

I have been trying to build a universal scraper. But somehow there is some site that I am unable to access for some reason.
I have tried to use various options available on the internet to make sure that I avoided the bot detection flag but somehow the site apparently "detects" that I am a bot.
Here are the options I have been using.
```options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-data-dir=" + r'C:\Users\JEGSTUDIO\AppData\Local\Google\Chrome\selenium')
options.add_argument("window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)```
I see and compare the cookies, and it looks like this site is using Cloudflare js based on the cookie naming.
https://support.cloudflare.com/hc/en-us/articles/200170136-Understanding-Cloudflare-Challenge-Passage-Captcha-
Here is the full code so you guys can try
```from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("user-data-dir=" + r'C:\Users\JEGSTUDIO\AppData\Local\Google\Chrome\selenium')
options.add_argument("window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\JEGSTUDIO\Gochi\Scraping Project\Scraper - AITOPIA v2\chromedriver88crack.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.get("https://google.com")
input('Next site')
driver.get("https://www.harrods.com/")
input('enter to quit')
driver.quit()```
Any clue would be appreciated
options.add_argument("--remote-debugging-port=9222")
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
driver = webdriver.Chrome(options=options)
driver.maximize_window()
driver.get("https://www.harrods.com/")
Adding remote debugging port makes the site works

HotJar suspicious UserAgent error, nothing on google, Trying to run a python scraper tool for sports odds tracking

c:\Users\Administrator\Downloads\cloudbet(2).py:47: DeprecationWarning: use options instead of chrome_options
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r"C:\Python\Python38\Scripts\chromedriver.exe")
DevTools listening on ws://127.0.0.1:54368/devtools/browser/77f2d50d-65c6-49bf-af5b-7923f3d40bfc
[0619/100243.424:INFO:CONSOLE(109)] "Hotjar not launching due to suspicious userAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/84.0.4147.56 Safari/537.36", source: https://www.cloudbet.com/static/js/2.08e072ab.chunk.js (109)
I have faced same issue "with some websites". It was related to running Google Chrome webdriver in headless mode and that is showing in your post details.
"Hotjar not launching due to suspicious userAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/84.0.4147.56 Safari/537.36"
The following options could help in resolving the issue:
Running Chrome driver in normal mode instead of headless. Namely, remove chrome_options.add_argument("--headless") option.
Running Chrome driver in headless mode, but bypass the auto generated user-agent with a normal-mode user agent by adding the following argument (as an example of that scenario) :
userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.56 Safari/537.36"
chrome_options.add_argument(f'user-agent={userAgent}')
Try using headless Firefox instead.
there is a Cloudbet Odds API available under https://docs.cloudbet.com/ that should make it easier to access odds.
Is there anything missing that you're looking for?

Docker Selenium Chromedriver: Unfortunately, automated access to this page was denied

I am using selenium chromedriver in my python project.
The application is running under Docker.
When I try to access http://mobile.de website I got rejected stating:
Unfortunately, automated access to this page was denied.
Here is my initialization code:
CHROME_DRIVER_PATH = os.path.abspath('assets/chromedriver')
chrome_options = ChromeOptions()
chrome_options.binary_location = "/usr/bin/google-chrome"
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
self.web_driver_chrome = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH, options=chrome_options)
And here is my send request code:
def get_page_content(self, url):
url = "https://www.mobile.de/"
self.web_driver_chrome.get(url)
print(self.web_driver_chrome.page_source)
return self.web_driver_chrome.page_source
Is there any way I can pass this "automated access check"?
when using --headless it append HeadlessChrome to the user-agent
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/71.0.3578.98 Safari/537.36
The solution is adding argument to set normal user-agent
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
chrome_options.add_argument('user-agent=' + user_agent)

How to rotate various user agents using selenium python on each request

I want to make 10 requests to https://www.google.com/ but with random user agents using selenium and python. I've a loop and inside that loop I'm making 10 requests with random user agents (using fake-user agent). The main problem is for every request web driver is opening a new instance of google chrome and I want to do this in one single instance but with different user agents. How can I make this possible ? 1 google chrome instance and 10 requests with 10 random user agents. Here is my code:
chrome_options = Options()
chrome_options.add_argument('no-sandbox')
chrome_options.add_argument("--start-maximized")
ua = UserAgent()
for i in range(0, 10):
userAgent = ua.random
chrome_options.add_argument('--user-agent="' + str(userAgent) + '"')
driver1 = webdriver.Chrome(chrome_options=chrome_options,
executable_path="C:/Python34/chromedriver")
driver1.get('https://www.google.com/')
time.sleep(5)
First the update 1
execute_cdp_cmd(): With the availability of execute_cdp_cmd(cmd, cmd_args) command now you can easily execute google-chrome-devtools commands using Selenium. Using this feature you can modify the user-agent easily to prevent Selenium from getting detected.
Code Block:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.97
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.53
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')
Console Output:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36
Browser Snapshot:
Legend: 1 - Applicable only to Selenium python clients.
Originally answered Nov 6 '18 at 8:00
No. When you configure an instance of a ChromeDriver with ChromeOptions to initiate a new Chrome Browser Session the configuration of the ChromeDriver remains unchanged throughout the lifetime of the ChromeDriver and remains uneditable. So you can't change the user agent when the WebDriver instance is executing the loop making 10 requests.
Even if you are able to extract the ChromeDriver and ChromeSession attributes e.g. UserAgent, Session ID, Cookies and other session attributes from the already initiated Browsing Session still you won't be able to change those attributes of the ChromeDriver.
A cleaner way would be to call driver.quit() within tearDown(){} method to close and destroy the ChromeDriver and Chrome Browser instances gracefully and then span a new set of ChromeDriver and Chrome Browser instance with the new set of configurations.
Here you can find a relevant discussion on How can I reconnect to the browser opened by webdriver with selenium?
Reference
You can find a couple of relevant detailed discussions in:
How to change the User Agent using Selenium and Python
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Yes. It is possible now with cdp:
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser1"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser2"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser3"}})
driver.get('https://www.httpbin.org/headers')
it open 10 chrome instance because you did notclose() it, try
...
...
driver1.get('https://www.whatsmyua.info/')
time.sleep(5)
driver1.close()

Categories