Certain Aspects of a Website Are Being Blocked When Using Selenium - python

I have been trying to create a script to login and post an ad. However when directing selenium either the "post an ad" page or the "login" page, the URL doesn't load, and gives me a solid white screen. However when I direct the URL to the homepage or any other page, other than the ones already listed, the website loads just fine.
My problem solving:
I have read that my IP might have got blocked, however when I change my public IP address through my router, the problem persists.
I am currently unsure if the source of the problem stands from my IP or the possibly the website's scripts are only meant to detect and block scripts on certain pages.
I am also using chromedriver version 81.0.4044.69 and Chrome version Version 81.0.4044.129 (Official Build) (64-bit)
Here is my code:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.kijiji.ca/t-login.html?targetUrl=L3Atc2VsZWN0LWNhdGVnb3J5Lmh0bWw/XnIrUFRKMS9oU1cxc29PdXAxbjUveFE9PQ--')
The result, and no errors
Blank White Screen Found When Running The Program

Try faking the user-agent (some sites really are picky in regards to that) using this module
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
options = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
options.add_argument(f'--user-agent={userAgent}')
driver = webdriver.Chrome(options=options)
driver.get("https://www.google.co.in")
driver.quit()

Related

using proxy with selenium webdriver firefox in python not working

I would like to make a program that modifies my ip when I go to consult a site with selenium, I use webdriver firefox, unfortunately the site that I use for my tests returns my ip and not the ip that I indicated in the options, could you tell me the error please.
The program launches and the firefox page opens (I don't use headless for the test), but it's my ip that is returned and not the one from the specified proxy.
here is my program
from selenium import webdriver
options = webdriver.FirefoxOptions()
proxy = f'{"137.74.65.101"}:{"80"}'
options.add_argument(f'--proxy-server={proxy}')
driver = webdriver.Firefox(options=options)
driver.get('https://httpbin.org/ip')

WebDriverError#chrome instead of WebDriverError#firefox while using Firefox browser with selenium

I'm trying to write several tests using selenium, but I'm seeing the following strange behavior.
When I run the tests like this:
from selenium.webdriver import Firefox, FirefoxOptions
from selenium.webdriver.firefox.service import Service
options = FirefoxOptions()
service = Service()
brow = Firefox(service=service, options=options)
brow.execute("get", {'url': 'https://python.org'})
I get the result I expected, the python.org website is opened in Firefox browser.
But if I make a mistake in URL, I'm getting the following error:
from selenium.webdriver import Firefox, FirefoxOptions
from selenium.webdriver.firefox.service import Service
options = FirefoxOptions()
service = Service()
brow = Firefox(service=service, options=options)
brow.execute("get", {'url': 'qwerty'})
selenium.common.exceptions.InvalidArgumentException: Message: Malformed URL: URL constructor: qwerty is not a valid URL.
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:186:5
InvalidArgumentError#chrome://remote/content/shared/webdriver/Errors.jsm:315:5
GeckoDriver.prototype.navigateTo#chrome://remote/content/marionette/driver.js:804:11
I just want to understand why I see here WebDriverError#chrome, and not WebDriverError#firefox or something like that.
Is this a bug, or am I doing something wrong?
These error messages...
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:186:5
InvalidArgumentError#chrome://remote/content/shared/webdriver/Errors.jsm:315:5
GeckoDriver.prototype.navigateTo#chrome://remote/content/marionette/driver.js:804:11
containing the phrase #chrome may leave an impression of a strange behavior while using GeckoDriver and firefox combo.
However, as per #AutomatedTester's comment in the GitHub discussion Selenium 3.4.0-GeckoDriver 0.17.0 : GeckoDriver producing logs through Chromium/Chrome modules #787:
These errors are nothing to worry about. Mozilla uses different open source projects to build Firefox for different reasons. It showing Chrome errors means nothing in the big picture.
So you can ignore them safely.

Unable to load the webpage https://www.riachuelo.com.br/feminino/colecao-feminino using Selenium and Python

I've been trying to scrape this page (https://www.riachuelo.com.br/feminino/colecao-feminino) with Selenium but I canĀ“t manage to access the html because it never loads. I've tried using random user agents and other browsers, but the problem persists. Any ideas why is this happening?
Here is the code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
URL = "https://www.riachuelo.com.br/feminino/colecao-feminino"
options = Options()
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(chrome_options=options,executable_path=r"C:\Program Files (x86)\chromedriver.exe")
driver.get(URL)
I executed your usecase to load the webpage at https://www.riachuelo.com.br/feminino/colecao-feminino using Selenium as follows:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.riachuelo.com.br/feminino/colecao-feminino')
Similarly, as per your observation I have hit the same roadblock that the webpage never loads.:
Analysis
While inspecting the DOM Tree of the webpage you will find that some of the <iframe>, <script> tag refers to the keyword dist. As an example:
src="https://dtbot.directtalk.com.br/1.0/staticbot/dist/js/../index.html#!/?token=c243ce95-db6c-4ab6-9f2b-bf60d69c2d3d&widget=true&top=40&text=Alguma%20d%C3%BAvida%3F&textcolor=ffffff&bgcolor=4E1D3A&from=bottomRigth"
<script id="dtbot-script" src="https://dtbot.directtalk.com.br/1.0/staticbot/dist/js/dtbot.js?token=c243ce95-db6c-4ab6-9f2b-bf60d69c2d3d&widget=true&top=40&text=Alguma%20d%C3%BAvida%3F&textcolor=ffffff&bgcolor=4E1D3A&from=bottomRigth"></script>
Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.
Distil
As per the article There Really Is Something About Distil.it...:
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
Further,
"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
Reference
You can find a couple of detailed discussion in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Akamai Bot Manager detects WebDriver driven Chrome Browsing Context
Is there a version of selenium webdriver that is not detectable?

Firefox is not launching site

geckodriver does launch Firefox but firefox doesn't get url. Please see and point what's wrong with my function. would be great help as im very new to selenium and python
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
def Login(SiteUrl):
driver = webdriver.Firefox()
driver.get(SiteUrl)
if __name__ =="__main__":
url = "www.google.com"
Login(url)
There could be multiple reasons -
Try including HTTP protocol in the url i.e. - http://www.google.com
You might be behind a proxy server. See this SO question -> Selenium WebDriver: Firefox starts, but does not open the URL & Selenium WebDriver.get(url) does not open the URL
Your versions of the driver and the browser does not match.
That simply means either your libs are old or Gecko binary is old. New gecko-libs are available now.
Download it from :
https://github.com/mozilla/geckodriver/releases
Download new version of selenium python libs from below URL:
https://pypi.python.org/pypi/selenium
Another thing as Martin point out
Add the protocol like http or https before passing URL
So use URL as :
url = "https://www.google.com"

Webdriver phantomjs no longer following link on click

I use a simple webdriver phantomjs script to update some adverts on preloved.co.uk. This script worked great until recently, but then started failing with the "Click submitted but load failed" error after the login link was clicked. In accordance with this I updated my version of phantomjs to latest stable, 1.9.7 following the guide here. However, now the login click does not seem to register either, and the page does not reload.
The first step is simply getting to login form page.
from selenium import webdriver
br = webdriver.PhantomJS(service_log_path='/path/to/logfile.log')
url = "http://www.preloved.co.uk"
br.get(url)
# Go to login page
login_button = br.find_element_by_xpath('//div[#id="header-status-login"]/a')
login_button.click()
Normally (and if you replace the browser line with br = webdriver.Firefox() for example), this results in reloading to login page, and the script proceeds from there, but now it appears the click does not load the new page at all and br.current_url is still 'http://www.preloved.co.uk/'
Why doesn't this load work?
Even if I extract the href and do an explicit GET it doesn't seem to follow and reload:
newurl=login_button.get_attribute('href')
br.get(newurl)
br.current_url is still 'http://www.preloved.co.uk/'.
The login page is secured through https. Recently the POODLE vulnerability forced websites to move away from SSLv3 for https, but since PhantomJS uses SSLv3 per default the login page doesn't load. See also this answer.
This can be fixed by passing --ssl-protocol=tlsv1 or --ssl-protocol=any to PhantomJS or upgrading PhantomJS to at least version 1.9.8. It seems that the service_args argument could be used for that in the python bindings for Selenium.
It looks like in the current official implementation the service_args cannot be passed from WebDriver to the Service in PhantomJS. You can sub-class it.
from selenium import webdriver
from selenium.webdriver.phantomjs.service import Service
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
class PhantomJSService(webdriver.PhantomJS):
def __init__(self, executable_path="phantomjs", port=0,
desired_capabilities=DesiredCapabilities.PHANTOMJS,
service_args=None, service_log_path=None):
self.service = Service(executable_path,
port=port, service_args=service_args,
log_path=service_log_path)
self.service.start()
try:
RemoteWebDriver.__init__(self,
command_executor=self.service.service_url,
desired_capabilities=desired_capabilities)
except:
self.quit()
raise
It seems that this webdriver fork contains the necessary arguments to set those options.

Categories