python selenium identificate browser running

python selenium identificate browser running - python

In my website when users execute one operation in server start new seesion chrome webdrive (python selenium), for monitorization need identificate the browser opened.
UA = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.{NNR} (KHTML, like Gecko) Chrome/42.0.2288.6 Safari/537.{NNR}".format(NNR=NNR)
options = webdriver.ChromeOptions()
options.add_argument('--user-agent={UA}'.format(UA=UA))
options.add_argument("--lang=it");
options.add_argument("--test-type")
self.driver = webdriver.Chrome(chrome_options=options)
need same solution, when the browser is opened want to be associated with a name, visible to the human eye! How i can gived name to browser in selenium ?

Related

While I'm scraping with Selenium it keeps telling me that I'm an unusual browser and that I have to enable javascrept

I just started learning programming and started with scraping with python Selenium but when get the Url and send elemets the website keep sending me (Your browser is a bit unusual...
Try disabling ad blockers and other extensions, enabling javascript, or using a different web browser.)
I tried some of the solutions provided on the site, but none of them solved my problem.
Can you explain and solve the problem with python please?
import selenium
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('chromedriver.exe', options=options)
driver.set_window_size(620, 720)
driver.delete_all_cookies()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver.implicitly_wait(5)
options.add_argument("--headless")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
driver.get('https://sso.godaddy.com/v1/account/create?realm=idp&path=%2Fcontact%2Fvalidate%3FcontactType%3DphoneMobile%26app%3Dsso%26path%3Dprofile%252Fedit%26profileUpdate%3DTrue%26userInteraction%3DPROFILE_UPDATE&app=sso&auth_reason=1&iframe=false')

How to run a script in a headless mode

There is a script-parser site written in Python with Selenium, if I run it in headless mode, so as not to open the browser window, it can not find the desired item and spar the information from it. If I run it without headless mode, it works fine
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r"chromedriver.exe")
driver.get(f'''https://www.wildberries.ru/catalog/38862450/detail.aspx?targetUrl=SP''')
time.sleep(20)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(20)
element = driver.find_element(by=By.CLASS_NAME, value="address-rate-mini")
btn = driver.find_elements(by=By.CLASS_NAME, value="btn-base")
It can't find the btn I need

Browser Detection:
Some webpages can detect that you are scraping their site by looking at your user agent. A normal browsing agent looks like this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
However in headless mode, your browsing agent looks like this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36
Whatever site you are scraping probably detected the "Headless" tag and possibly restricting the btn element from you.
Solution:
A simple solution to this would be to add the following Selenium option to programmatically change your user agent:
options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"')
More Information:
Python selenium headless mode missing elements

Selenium Chrome does not return Facebook user's post like in normal Chrome

So recently I am trying to scrape Facebook's page/user's without logging in(I know login could solve so many issues but it also leads to a potential account ban). However, I have found that the page source returned does not match how it looks in a normal Chrome browser.
For example, for Facebook Page, you can see its post like this in normal Chrome (Not logged in, Incognito Mode)
For Facebook User, you can see its post like this in normal Chrome (Not logged in, Incognito Mode)
However, in Selenium, I am unable to retrieve the post content(actually what I only need is the newest post id).
The code is here for reference
import os
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36")
chrome_options.add_argument("--enable-javascript")
driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36"})
driver.implicitly_wait(10)
driver.get("https://www.facebook.com/<Page or User>")
print(driver.page_source)
I have tried adding/removing user-agent settings but seems no luck. The user-agent in the code is my current browser's one
What I want to do is to retrieve the post content(or only post id) in selenium chrome driver like it does in normal chrome. Thank you.

Headless selenium exits immediately

I have a headless web scraper. When it run the scraper takes a base url, scrapes the links on that page, and then scrapes the links it got off that page.
The problem I'm having is that when I run the scraper it pretty much immediately exits. When I run the scraper normally (non headless) it works perfectly fine.
These are my selenium arguments:
options = webdriver.ChromeOptions()
options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'),
options=options)
I've also tried adding these options but it gave me the same result:
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--window-size=1920,1080")
options.add_argument("--start-maximized")
How can I solve this? I'm trying to deploy this scraper to heroku and none of the things I've tried above worked.

Basically some website won't load in headless mode unless a user agent is specified.
To fix this I added:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
options.add_argument(f'user-agent={user_agent}')
This fixed the problem of my scraper exiting immediately

How to rotate various user agents using selenium python on each request

I want to make 10 requests to https://www.google.com/ but with random user agents using selenium and python. I've a loop and inside that loop I'm making 10 requests with random user agents (using fake-user agent). The main problem is for every request web driver is opening a new instance of google chrome and I want to do this in one single instance but with different user agents. How can I make this possible ? 1 google chrome instance and 10 requests with 10 random user agents. Here is my code:
chrome_options = Options()
chrome_options.add_argument('no-sandbox')
chrome_options.add_argument("--start-maximized")
ua = UserAgent()
for i in range(0, 10):
userAgent = ua.random
chrome_options.add_argument('--user-agent="' + str(userAgent) + '"')
driver1 = webdriver.Chrome(chrome_options=chrome_options,
executable_path="C:/Python34/chromedriver")
driver1.get('https://www.google.com/')
time.sleep(5)

First the update 1
execute_cdp_cmd(): With the availability of execute_cdp_cmd(cmd, cmd_args) command now you can easily execute google-chrome-devtools commands using Selenium. Using this feature you can modify the user-agent easily to prevent Selenium from getting detected.
Code Block:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.97
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.53
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')
Console Output:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36
Browser Snapshot:
Legend: 1 - Applicable only to Selenium python clients.
Originally answered Nov 6 '18 at 8:00
No. When you configure an instance of a ChromeDriver with ChromeOptions to initiate a new Chrome Browser Session the configuration of the ChromeDriver remains unchanged throughout the lifetime of the ChromeDriver and remains uneditable. So you can't change the user agent when the WebDriver instance is executing the loop making 10 requests.
Even if you are able to extract the ChromeDriver and ChromeSession attributes e.g. UserAgent, Session ID, Cookies and other session attributes from the already initiated Browsing Session still you won't be able to change those attributes of the ChromeDriver.
A cleaner way would be to call driver.quit() within tearDown(){} method to close and destroy the ChromeDriver and Chrome Browser instances gracefully and then span a new set of ChromeDriver and Chrome Browser instance with the new set of configurations.
Here you can find a relevant discussion on How can I reconnect to the browser opened by webdriver with selenium?
Reference
You can find a couple of relevant detailed discussions in:
How to change the User Agent using Selenium and Python
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Yes. It is possible now with cdp:
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser1"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser2"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser3"}})
driver.get('https://www.httpbin.org/headers')

it open 10 chrome instance because you did notclose() it, try
...
...
driver1.get('https://www.whatsmyua.info/')
time.sleep(5)
driver1.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python selenium identificate browser running - python

Related

While I'm scraping with Selenium it keeps telling me that I'm an unusual browser and that I have to enable javascrept

How to run a script in a headless mode

Selenium Chrome does not return Facebook user's post like in normal Chrome

Headless selenium exits immediately

How to rotate various user agents using selenium python on each request

Categories

Resources