I am experiencing a very strange behaviour when testing Chrome via selenium webdriver.
Instead of navigating to pages like one would expect the 'get' command leads only to the download of tiny files (no type or.apsx files) from the target site.
Importantly - this behavior only occurs when I pass chrome_options as an argument
to the Chrome driver.
The same testing scripts work flawless with firefox driver.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options # tried with and without
proxy = '127.0.0.1:9951' # connects to a proxy application
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % proxy)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('whatismyip.com')
Leads to the automatic download of a file called download (no file extension, Size 2 Byte).
While calling other sites results in the download of small aspx files.
This all happens while the browser page remains blank and no interaction with
elements happen = the site is not loaded at all.
No error message, except element not found is thrown.
This is really strange.
Additional info:
I run Debian Wheezy 32 bit and use Python 2.7.
Any suggestions how to solve this issue?
I tried your code and captured the traffic on localhost using an SOCKS v5 proxy through SSH. It is definitely sending data through the proxy but no data is coming back. I have confirmed the proxy was working using Firefox.
I'm running Google Chrome on Ubuntu 14.04 LTS 64-bit. My Chrome browser gives me the following message when I try to configure a proxy in its settings menu:
When running Google Chrome under a supported desktop environment, the
system proxy settings will be used. However, either your system is not
supported or there was a problem launching your system configuration.
But you can still configure via the command line. Please see man
google-chrome-stable for more information on flags and environment
variables.
Unfortunately I don't have a man page for google-chrome-stable.
I also discovered that according to the selenium documentation Chrome is using the system wide proxy settings and according to their documentation it is unknown how to set the proxy in Chrome programmatically: http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#using-a-proxy
Related
I am writing automation for one of the site. I am able to access that site in chrome when I open that site manually in chrome or incognito mode. But when I am trying to launch it using automation in any browser its giving me site can not be reached error
Below is my simple code -
ChromeOptions options = new ChromeOptions();
options.AddArguments("excludeSwitches", "enable-automation");
options.AddArguments("useAutomationExtension", "False");
options.AddArgument("incognito");
IWebDriver driver = new ChromeDriver(options);
Console.WriteLine("Test MY Site");
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(60);
driver.Manage().Window.Maximize();
driver.Manage().Cookies.DeleteAllCookies();
driver.Navigate().GoToUrl("https://example.net");
I tried with Selenium C#, Selenium Python , Cypress etc. but getting same error.
I also tried to launch site in chrome, edge, firefox and safari there also getting same error.
Note- I am using latest version of chrome, latest version of selenium and windows 10. I also tried after downgrading selenium and chrome
I have also tried setting proxy manual and automatic. also tried with no proxy setting. Internet connection is not at all issue since site is working fine when I am using that site manually in chrome
I found undetected-chrome driver helps in this case but I did find exe for undetected-chromedriver in order to launch site
Please help
How to use normal chrome completely without chromedriver selenium python not duplicate.
I am using python 3.8.8,os is windows 7 ultimate with pycharm as
IDE and chrome version is around 96. and my problem is that whenever I use my python script to scrape a website it uses chromedriver and when I specify what's given below:
options = Options ()
options.add_argument(r"user-data-dir=my chrome path which is not Executable instead the user data")
#this works but when opening chrome it shows "browser is controlled by automated software" and changing it to normal chrome. Exe won't work
Sure it uses normal chrome with my credentials but it still needs chromedriver to work and when I delete the chromedriver it throw an error and when I go into selenium source code in a file called site.py(or sites.py) which I changed the variable self. executable to chrome.exe path and it worked and it won't show the message browser is controlled by automated software but it won't do anything , it is just stuck there and what I want to do is use chrome as the browser to scrape without chromedriver in my pc is it possible? If yes please tell me how should I go on to do it and you can ask for further Clarification and details and Thanks in advance
By default, selenium is detected as an automated software and is flagged by most websites, and the flag is unable to be removed. There are, however, external libraries that can be installed that can remove the flag.
There are options here to try to get around the default flag and hide the fact the browser is automated.
Edit
I understand the question further, and see that you want a more portable chrome option. Chrome driver is a very specific program controlled by selenium and must be used. There is no substitute. You can use Firefox driver or internet explorer, but a webdriver must be used (hence the name driver for driving the main browser). When you specify the directory for the Chrome binary, you aren’t removing the middleman of the chromedriver, only Specifying where chrome driver needs to look!
Using Selenium you won't be able to initiate/spawn a new Browsing Context i.e. Chrome Browser session without the ChromeDriver.
The Parts and Pieces
As a minimum requirement, the WebDriver i.e the ChromeDriver talks to a browser through a driver and the communication is two way:
WebDriver passes commands to the browser through the driver
Receives information back via the same route.
Hence using ChromeDriver is a mandatory requirement.
I am trying to open the website in selenium python but it is showing blank page. but when i open that website in normal google chrome it is working .
here is the code i am writing to open the website.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("https://shop.coles.com.au/a/wentworth-point/home")
Error I am Getting in chrome console Failed to load resource: the server responded with a status of 429 () and Failed to load resource: net::ERR_UNKNOWN_URL_SCHEME –
You must specify the chromedriver path. Something like this:
https://seleniumbyexamples.github.io/navget
Verify that version of chrome driver is same of chrome browser installed on your machine
and make sure that path of chrome driver is set in your PATH variable.
http://chromedriver.chromium.org/downloads
The error code 429 that you are getting means too many requests are sent in a given amount time. Just try to clean your build history and run it again. Regarding the version of Chrome, if that was to be the issue the browser must have not opened in the first place and you would have got other exception like SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 83
Just to be sure you mat check for the version of chrome you are using and the driver downloaded in the system, if it is not appropriate download the latest chrome driver from web and provide in the executable path to chrome or else add it in environment variables of your system.
I have up and running an Apache Server with Python 3.x installed already on it. Right now I am trying to run ON the server a little python program (let's say filename.py). But this python program uses the webdriver for Chrome from Selenium. Also it uses sleep from time (but I think this comes by default, so I figure it won't be a problem)
from selenium import webdriver
When I code this program for the first time on my computer, not only I had to write the line of code above but also to manually download the webdriver for Chrome and paste it on /usr/local/bin. Here is the link to the file in case you wonder: Webdriver for Chorme
Anyway, I do not know what the equivalences are to configure this on my server. Do you have any idea how to do it? Or any concepts I could learn related to installing packages on an Apache Server?
Simple solution:
You don't need to install the driver in usr/local/bin. You can have the .exe anywhere and you can specify that with an executable path, see here for an example.
Solution for running on a server
If you have python installed on the server, ideally >3.4 which comes with pip as default. Then install ChromeDriver on a standalone server, follow the instructions here
Note that, Selenium always need an instance of a browser to control.
Luckily, there are browsers out there that aren't that heavy as the usual browsers you know. You don't have to open IE / Firefox / Chrome / Opera. You can use HtmlUnitDriver which controls HTMLUnit - a headless Java browser that does not have any UI. Or a PhantomJsDriver which drives PhantomJS - another headless browser running on WebKit.
Those headless browsers are much less memory-heavy, usually are faster (since they don't have to render anything), they don't require a graphical interface to be available for the computer they run at and are therefore easily usable server-side.
Sample code of headless setup
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)
It's also worth reading on running Selenium RC, see here on that.
I am web-scraping Bitcoin quotations from Coinsuper. It is a javascript page. When I first develop my code on Windows using Python 3.7, Selenium, and Chromium, it works well.
I want to deploy this code on my server to fetch data continuously. However, it doesn't work under Linux.
I am sure my code can work, at least on most websites, including Apple, Google, Baidu, Xueqiu, etc.
For the OS system, I have tried Debian 9 and Ubuntu 18.04.
For webdriver, I have tried both Chrome and Firefox.
For webdriver parameters, I have tried:
Add header, including fake-useragent
Ignore SSL certificate
Disable GPU
These make no difference.
I think it might because Coinsuper has some anti-scraping strategy. But I am also confused why the similar code can work on Windows but not on Linux. Are there any differences that might cause this situation?
The code:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu') # Only included in Linux version
chrome_options.add_argument('--no-sandbox') # Only included in Linux version
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://www.coinsuper.com/trade')
print(driver.page_source)
driver.quit()
I am the one who asks this question. Thank you all for helping me! Finally, I have solved this problem.
#furas showed that my code could actually get responses from Coinsuper.
#Dalvenjia inspired me that this might be caused by IP blacklist, which is most probable for the cloud servers. And yes, I am using a cloud server.
Here is the solution:
Start a Shadowsocks server from my home IP address, or use any proxy you have.
Start Shadowsocks client on the server:
Add one more argument to ChromeDriver in Python script:
chrome_options.add_argument('--proxy-server=socks5://127.0.0.1:xxxx')
Now I can get contents by bypassing the IP blacklist.
I recommend you to use WebDriverManager dependency:
https://github.com/bonigarcia/webdrivermanager
By using WebDriverManager, you didn't need to download or manage drivers path in code.