I've written a script in python in combination with selenium to parse some company names from a webpage. The selector I've defined are flawless. However, as soon as the webpage opens up an annoying ad pops up hiding the data and for that I can't reach there. How can i kick it out and parse the data I would like to. I've tried with switching several iframes available in that webpage but none of them worked. The existing one which I used within my script throws an error showing Message: no such element: Unable to locate element.
This is what i tried so far:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.inc.com/inc5000/list/2017")
driver.switch_to_frame(driver.find_element_by_id("jw_player_iconic"))
for item in driver.find_elements_by_css_selector("#data-container .row"):
company = item.find_elements_by_css_selector(".company a")[0].text
print(company)
driver.quit()
The pop-up ad is something like below (the bright one):
You can try to wait for ad to close it by clicking "SKIP" button:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.inc.com/inc5000/list/2017")
driver.maximize_window()
try:
ad_iframe_close = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, "//span[.='SKIP']")))
ad_iframe_close.click()
except TimeoutException:
pass
for item in driver.find_elements_by_css_selector("#data-container .row"):
company = item.find_elements_by_css_selector(".company a")[0].text
print(company)
This should allow you to wait to close ad frame or do nothing in case it didn't appear whithin 3 seconds
Related
I'm trying to execute a selenium program in Python to go to a new URL on click of a button in the current homepage. I'm new to selenium and any help regarding this would be appreciated. Here's my code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://nmit.ac.in'
driver = webdriver.Chrome()
driver.get(url)
try:
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(By.LINK_TEXT, "Parent Portal")
)
except:
print()
driver.find_element(By.LINK_TEXT, "Parent Portal").click()
I have tried to increase the wait time as well as using all forms of the supported located strategies under the BY keyword, but to no avail. I keep getting this error.
As far as I know, you shouldn't be worried by those errors. Although, I proved your code and it's not finding any element in the web page. I can recommend you use the xpath of the element you want to find:
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html//div[#class='wsmenucontainer']//button[#href='https://nmitparents.contineo.in/']"))
)
element.click()
#wait for the new view to be loaded
time.sleep(10)
driver.quit()
Psdt: you can use the extention Ranorex Selocity to extract a good and unique xpath of any element in a webpage and also test it!!
image
I click on a specific button on a page, but for some reason there is one of the buttons that I can't click on, even though it's positioned exactly like the other elements like it that I can click on.
The code below as you will notice, it opens a page, then clicks to access another page, do this step because only then can you be redirected to the real url that has the //int.
import datetime
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
with open('my_user_agent.txt') as f:
my_user_agent = f.read()
headers = {
'User-Agent': my_user_agent
}
options = Options()
options.set_preference("general.useragent.override", my_user_agent)
options.set_preference("media.volume_scale", "0.0")
options.page_load_strategy = 'eager'
driver = webdriver.Firefox(options=options)
today = datetime.datetime.now().strftime("%Y/%m/%d")
driver.get(f"https://int.soccerway.com/matches/{today}/")
driver.find_element(by=By.XPATH, value="//div[contains(#class,'language-picker-trigger')]").click()
time.sleep(3)
driver.find_element(by=By.XPATH, value="//li/a[contains(#href,'https://int.soccerway.com')]").click()
time.sleep(3)
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]")))
driver.find_element(by=By.XPATH, value="//a[contains(#class,'tbl-read-more-btn')]").click()
time.sleep(0.1)
except:
pass
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//div[#data-exponload='']//button[contains(#class,'expand-icon')]")))
for btn in driver.find_elements(by=By.XPATH, value="//div[#data-exponload='']//button[contains(#class,'expand-icon')]"):
btn.click()
time.sleep(0.1)
I've tried adding btn.location_once_scrolled_into_view before each click to make sure the button is correctly in the click position, but the problem still persists.
I also tried using the options mentioned here:
Selenium python Error: element could not be scrolled into view
But the essence of the case kept persisting in error, I couldn't understand what the flaw in the case was.
Error text:
selenium.common.exceptions.ElementNotInteractableException: Message: Element <button class="expand-icon"> could not be scrolled into view
Stacktrace:
RemoteError#chrome://remote/content/shared/RemoteError.jsm:12:1
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:192:5
ElementNotInteractableError#chrome://remote/content/shared/webdriver/Errors.jsm:302:5
webdriverClickElement#chrome://remote/content/marionette/interaction.js:156:11
interaction.clickElement#chrome://remote/content/marionette/interaction.js:125:11
clickElement#chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:204:29
receiveMessage#chrome://remote/content/marionette/actors/MarionetteCommandsChild.jsm:92:31
Edit 1:
I noticed that the error only happens when the element is colored orange (when they are colored orange it means that one of the competition games is happening now, in other words it is live).
But the button is still the same, it keeps the same element, so I don't know why it's not being clicked.
See the color difference:
Edit 2:
If you open the browser normally or without the settings I put in my code, the elements in orange are loaded already expanded, but using the settings I need to use, they don't come expanded. So please use the settings I use in the code so that the page opens the same.
What you missing here is to wrap the command in the loop opening those sections with try-except block.
the following code works. I tried running is several times.
import datetime
import time
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options, desired_capabilities=caps)
wait = WebDriverWait(driver, 10)
today = datetime.datetime.now().strftime("%Y/%m/%d")
driver.get(f"https://int.soccerway.com/matches/{today}/")
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(#class,'language-picker-trigger')]"))).click()
time.sleep(5)
wait.until(EC.element_to_be_clickable((By.XPATH, "//li/a[contains(#href,'https://int.soccerway.com')]"))).click()
time.sleep(5)
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]")))
driver.find_element(By.XPATH, "//a[contains(#class,'tbl-read-more-btn')]").click()
time.sleep(0.1)
except:
pass
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#data-exponload='']//button[contains(#class,'expand-icon')]")))
for btn in driver.find_elements(By.XPATH, "//div[#data-exponload='' and not(contains(#class,'status-playing'))]//button[contains(#class,'expand-icon')]"):
btn.click()
time.sleep(0.1)
UPD
We need to open only closed elements. The already opened sections should be stayed open. In this case click will always work without throwing exceptions. To do so we just need to add such indication - click buttons not inside the section where status is currently playing.
So im Trying to make a discord bot that sends news from selected news services, and Ive got it working on every website, except Bild.de. They have got a banner you have to accept before accesing the website, and I cant get past that. Like I said, I had no problems on any other website, but this one.
My code (python):
import time
from selenium import webdriver
# selenium part
url = 'https://www.bild.de/home/newsticker/news/alle-news-54190636.bild.html'
browser = webdriver.Chrome()
browser.get(url)
time.sleep(5)
#trying to accept cookie banner
browser.find_element_by_xpath(
"/html/body/div/div[2]/div[3]/div[2]/button").click()
Error Message
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div/div[2]/div[3]/div[2]/button"}
Things to be noted down here.
Cookies button is in iframe, so first we have to switch to iframe in Selenium.
I am using execute script to click on it.
Remember to switch back to default content when you are done with the iframe.
Code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.bild.de/home/newsticker/news/alle-news-54190636.bild.html")
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[title='SP Consent Message']")))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.message-component.message-row.mobile-reverse>div:nth-child(2)>button")))
driver.execute_script("arguments[0].click();", button)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I'm writing my first real scraper and although in general it's been going well, I've hit a wall using Selenium. I can't get it to go to the next page.
Below is the head of my code. The output below this is just printing out data in terminal for now and that's all working fine. It just stops scraping at the end of page 1 and shows me my terminal prompt. It never starts on page 2. I would be so grateful if anyone could make a suggestion. I've tried selecting the button at the bottom of the page I'm trying to scrape using both the relative and full Xpath (you're seeing the full one here) but neither work. I'm trying to click the right-arrow button.
I built in my own error message to indicate whether the driver successfully found the element by Xpath or not. The error message fires when I execute my code, so I guess it's not finding the element. I just can't understand why not.
# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
try:
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
except (TimeoutException, WebDriverException) as e:
print("A timeout or webdriver exception occurred.")
break
driver.quit()
What you can do is to set up Selenium expected conditions (visibility_of_element_located, element_to_be_clickable) and use a relative XPath to select the next page element. All of this in a loop (its range is the number of pages you have to deal with).
XPath for the next page link :
//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a
Code could look like :
## imports
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
## count the number of pages you have
els = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()]/a"))).get_attribute("data-current-page")
## loop. at the end of the loop, click on the following page
for i in range(int(els)):
***scrape what you want***
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a"))).click()
You were pretty close with while True and try-catch{} logic. To go to the next page using Selenium and python you have to induce WebDriverWait for element_to_be_clickable() and you can use either of the following Locator Strategies:
Code Block:
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
while True:
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class, 'state-active')]//following::li[1]/a[#href]"))).click()
print("Clicked for next page")
WebDriverWait(driver, 10).until(EC.staleness_of(driver.find_element_by_xpath("//a[contains(#class, 'state-active')]//following::li[1]/a[#href]")))
except (TimeoutException):
print("No more pages")
break
driver.quit()
Console Output:
Clicked for next page
No more pages
I'm trying to do some webscraping from a betting website:
As part of the process, I have to click on the different buttons under the "Favourites" section on the left side to select different competitions.
Let's take the ENG Premier League button as example. I identified the button as:
(source: 666kb.com)
The XPath is: //*[#id="SportMenuF"]/div[3] and the ID is 91.
My code for clicking on the button is as follows:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL Removed")
content = driver.find_element_by_xpath('//*[#id="SportMenuF"]/div[3]')
content.click()
Unfortunately, I always get this error message when I run the script:
"no such element: Unable to locate element:
{"method":"xpath","selector":"//*[#id="SportMenuF"]/div[3]"}"
I have tried different identifiers such as CCS Selector, ID and, as shown in the example above, the Xpath. I tried using waits and explicit conditions, too. None of this has worked.
I also attempted scraping some values from the website without any success:
from selenium import webdriver
from selenium.webdriver.common.by import By
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL removed")
content = driver.find_elements_by_class_name('price-val')
for entry in content:
print entry.text
Same problem, nothing shows up.
The website embeddes an iframe from a different website. Could this be the cause of my problems? I tried scraping directly from the iframe URL, too, which didn't work, either.
I would appreciate any suggestions.
Sometimes elements are either hiding behind an iframe, or they haven't loaded yet
For the iframe check, try:
driver.switch_to.frame(0)
For the wait check, try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '-put the x-path here-')))