I'm making an app (using Selenium webdriver in Chrome) that searches Google for a specified query (http://www.google.com/search?query) but everytime I search for it I want to change my IP so I'm using proxies.
The problem is Google blocks EVERY proxy I use. Is there anyway to bypass it? Maybe I'm using wrong type of proxies? (I've tried HTTP and HTTPS proxies, still they get blocked everytime)
Maybe my code is wrong?:
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"
options.add_argument("disable-extensions")
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
options.add_argument(f"--proxy-server=ip:port")
driver = Chrome(options=options, executable_path="C:/WebDriver/bin/chromedriver.exe")
driver.get("http://www.google.com/search?query")
Can it be a matter of the proxies quality?
Google has removed the proxy support for FTP entirely in Google Chrome versions 76 and newer. You can use firefox or edge. I tried with firefox and able to launch:
options = Options()
options.binary_location = "C:\Program Files\Mozilla Firefox\Firefox.exe"
options.add_argument("disable-extensions")
options.add_argument("start-maximized")
options.add_argument(f"--proxy-server=ip:port")
driver = webdriver.Firefox(executable_path=r'..\drivers\geckodriver.exe', options=options)
Import:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
Related
I want to scrap some data with selenium python. I have this type of screen sometimes :
Do you know to proceed in order to remove this type of verification ? Here my code :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
options.add_argument("start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--disable-site-isolation-trials")
options.add_argument("--allow-running-insecure-content")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('THE_WEBSITE_COM')
Selenium specifically and other automation tools have certain user agents and other identifiers which indicate that it's automated. So maybe have a play around with things like that. Some websites use anti bot tools to analyze browsing behaviours and patterns so try to slow it and randomize it eg. random time between page requests
Another trick is to look at the website and try to find if there are any alternative routes to get the information. For example: is there a public API you can use to bypass it? Is there a mobile version of the website? Sometimes mobile versions have less aggressive Captcha enforcement.
What I do most of the time is to launch the browser separately and connect to it using Dev port which is beautifully explained in this article.
To enable Chrome to open a port for remote debugging, we need to launch it with a custom flag –
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenum\ChromeProfile"
Then connect to the browser using this
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
#Change chrome driver path accordingly
chrome_driver = "C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver, chrome_options=chrome_options)
print driver.title
I am trying to access to our internal company site to pull screenshot of it using headless chrome on redhat linux.
For this I am using Python, Selenium, Poppler and Chromedriver.
It is working perfectly on Windows, however on non-gui linux without options.add_argument('--ignore-certificate-errors') its returning white blank page but with ('ignore-certificate-errors') option added its giving 401 error.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
DesiredCapabilities handlSSLErr = DesiredCapabilities.chrome ()
handlSSLErr.setCapability (CapabilityType.ACCEPT_SSL_CERTS, true)
WebDriver driver = new ChromeDriver (handlSSLErr);
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--headless')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path=os.path.join(FLASK_STATIC_FOLDER,'chromedriver'),options=options)
URL = '"our internal webpage/"%s' %int(facemapperid)
driver.get(URL)
If you have any suggestions
The option to ignore certificate error is
options.add_argument('--ignore-certificate-errors')
You missed to add --
I was able to achieve what I wanted by doing below
First I made connection to let it cache my cookie
driver.get("https://username:password#mywebsite")
and then do it again
URL = 'username:password#mywebsite
I am working on a project with selenium to scrape the data, but I don't want the browser to open and pop up. I just wanted to hide the browser and also not to display it in the taskbar also...
Some also suggested to use phantomJS but I didn't get them. What to do now ...
If you're using Chrome you can just set the headless argument like so:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
driver_exe = 'chromedriver'
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(driver_exe, options=options)
For chrome you could pass in the --headless parameter.
Alternatively you could let selenium work on a virtual display like this:
from selenium import webdriver
from xvfbwrapper import Xvfb
display = Xvfb()
display.start()
driver = webdriver.Chrome()
driver.get('http://www.stackoverflow.com')
print(driver.title)
driver.quit()
display.stop()
The latter has worked for me quite well.
To hide the browser while executing tests using Selenium's python you can use the minimize_window() method which eventually minimizes/pushes the Chrome Browsing Context effectively to the background using the following solution:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://www.google.co.in')
driver.minimize_window()
Alternative
As an alternative you can use the headless attribute to configure ChromeDriver to initiate google-chrome browser in Headless mode using Selenium and you can find a couple of relevant discussions in:
How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
If you're using Firefox, try this:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
driver_exe = 'path/to/firefoxdriver'
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(driver_exe, options=options)
similar to what #Meshi answered in case of Chrome
if you want to hide chrome or selenium driver there is a library pyautogui
import pyautogui
window = [ x for x in pyautogui.getAllWindows()]
by this, you are getting all window title
now you need to find your window
for i in window:
if 'Google Chrome' in i.title:
i.hide()
or you can play with your driver title also
I am working on downloading HAR from Chrome for YouTube through Selenium Python Script.
Code Snippet:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(url))
chrome_options.add_argument("--enable-quic")
self.driver = webdriver.Chrome(chromedriver,chrome_options = chrome_options)
self.proxy.new_har(args['url'], options={'captureHeaders': True})
self.driver.get(args['url'])
result = json.dumps(self.proxy.har, ensure_ascii=False)
I want QUIC to be used whenever I download HAR but when I look at the packets through Wireshark Selenium driver is using TCP only. Is there a way to force Chrome Driver to use QUIC? Or Is there an alternate to BMP?
A similar thing has been asked for Firefox in this question How to capture all requests made by page in webdriver? Is there any alternative to Browsermob? and there was a solution with Selenium alone without need of any BMP. So is it possible for Chrome?
Workaround for this problem could be: start Chrome normally (with your default profile or create another profile) and enable quic manually. Then start chromedriver with your profile loaded.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=/home/user/.config/google-chrome")
driver = webdriver.Chrome(executable_path="/home/user/Downloads/chromedriver", chrome_options=options)
How do you go about disabling cache on Chrome using Selenium Webdriver in python?
I did this but when I check in the browser it does not work
chrome_options = Options()
chrome_options.add_argument("--disable-application-cache")
browser = webdriver.Chrome(executable_path = path_to_chromedriver, chrome_options=chrome_options)