I'm trying to do some webscraping from a betting website:
As part of the process, I have to click on the different buttons under the "Favourites" section on the left side to select different competitions.
Let's take the ENG Premier League button as example. I identified the button as:
(source: 666kb.com)
The XPath is: //*[#id="SportMenuF"]/div[3] and the ID is 91.
My code for clicking on the button is as follows:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL Removed")
content = driver.find_element_by_xpath('//*[#id="SportMenuF"]/div[3]')
content.click()
Unfortunately, I always get this error message when I run the script:
"no such element: Unable to locate element:
{"method":"xpath","selector":"//*[#id="SportMenuF"]/div[3]"}"
I have tried different identifiers such as CCS Selector, ID and, as shown in the example above, the Xpath. I tried using waits and explicit conditions, too. None of this has worked.
I also attempted scraping some values from the website without any success:
from selenium import webdriver
from selenium.webdriver.common.by import By
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL removed")
content = driver.find_elements_by_class_name('price-val')
for entry in content:
print entry.text
Same problem, nothing shows up.
The website embeddes an iframe from a different website. Could this be the cause of my problems? I tried scraping directly from the iframe URL, too, which didn't work, either.
I would appreciate any suggestions.
Sometimes elements are either hiding behind an iframe, or they haven't loaded yet
For the iframe check, try:
driver.switch_to.frame(0)
For the wait check, try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '-put the x-path here-')))
Related
For one study research I would like to scrape some links from webpages which located out of viewport (to see this links you need to scroll down the page).
Page example (https://www.twitch.tv/lirik)
Link example: https://www.amazon.com/dp/B09FVR22R2
Link located in div class='Layout-sc-nxg1ff-0 itdjvg default-panel' (in total 16 links on the page).
I have write the script but I get empty list:
from selenium import webdriver
import time
browser = webdriver.Firefox()
browser.get('https://www.twitch.tv/lirik')
time.sleep(3)
browser.execute_script("window.scrollBy(0,document.body.scrollHeight)")
time.sleep(3)
panel_blocks = browser.find_elements(by='class name', value='Layout-sc-nxg1ff-0 itdjvg default-panel')
browser.close()
print(panel_blocks)
print(type(panel_blocks))
I just get empty list after page was loaded. Here is output from the script above:
/usr/local/bin/python /Users/greg.fetisov/PycharmProjects/baltazar_platform/Twitch_parser.py
[]
<class 'list'>
Process finished with exit code 0
p.s.
when webdriver opens the page, I see there is no scroll down action. It just open a page and then close it after time.sleep cooldown.
How I can change the script to get the links properly?
Any help or advice would be appreciated!
You are using a wrong locator.
You should use expected conditions explicit waits instead of hardcoded pauses.
find_elements method returns a list of web elements while you want to the link inside the element(s).
This should work better:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
browser = webdriver.Firefox()
browser.get('https://www.twitch.tv/lirik')
wait = WebDriverWait(browser, 20)
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#class='channel-panels-container']//a")))
time.sleep(0.5)
link_blocks = browser.find_elements_by_xpath("//div[#class='channel-panels-container']//a")
for link_block in link_blocks:
link = link_block.get_attribute("href")
print(link)
browser.close()
To print the values of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://www.twitch.tv/lirik")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.Layout-sc-nxg1ff-0.itdjvg.default-panel > a")))])
Console Output:
['https://www.amazon.com/dp/B09FVR22R2', 'http://bs.serving-sys.com/Serving/adServer.bs?cn=trd&pli=1077437714&gdpr=$%7BGDPR%7D&gdpr_consent=$%7BGDPR_CONSENT_68%7D&adid=1085757156&ord=[timestamp]', 'https://store.epicgames.com/lirik/rumbleverse', 'https://bitly/3GP0cM0', 'https://lirik.com/', 'https://streamlabs.com/lirik', 'https://twitch.amazon.com/tp', 'https://www.twitch.tv/subs/lirik', 'https://www.youtube.com/lirik?sub_confirmation=1', 'http://www.twitter.com/lirik', 'http://www.instagram.com/lirik', 'http://gfuel.ly/lirik', 'http://www.cyberpowerpc.com/', 'https://www.cyberpowerpc.com/page/Intel/LIRIK/', 'https://discord.gg/lirik', 'http://www.amazon.com/?_encoding=UTF8&camp=1789&creative=390957&linkCode=ur2&tag=l0e6d-20&linkId=YNM2SXSSG3KWGYZ7']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I'm writing my first real scraper and although in general it's been going well, I've hit a wall using Selenium. I can't get it to go to the next page.
Below is the head of my code. The output below this is just printing out data in terminal for now and that's all working fine. It just stops scraping at the end of page 1 and shows me my terminal prompt. It never starts on page 2. I would be so grateful if anyone could make a suggestion. I've tried selecting the button at the bottom of the page I'm trying to scrape using both the relative and full Xpath (you're seeing the full one here) but neither work. I'm trying to click the right-arrow button.
I built in my own error message to indicate whether the driver successfully found the element by Xpath or not. The error message fires when I execute my code, so I guess it's not finding the element. I just can't understand why not.
# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
try:
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
except (TimeoutException, WebDriverException) as e:
print("A timeout or webdriver exception occurred.")
break
driver.quit()
What you can do is to set up Selenium expected conditions (visibility_of_element_located, element_to_be_clickable) and use a relative XPath to select the next page element. All of this in a loop (its range is the number of pages you have to deal with).
XPath for the next page link :
//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a
Code could look like :
## imports
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
## count the number of pages you have
els = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()]/a"))).get_attribute("data-current-page")
## loop. at the end of the loop, click on the following page
for i in range(int(els)):
***scrape what you want***
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a"))).click()
You were pretty close with while True and try-catch{} logic. To go to the next page using Selenium and python you have to induce WebDriverWait for element_to_be_clickable() and you can use either of the following Locator Strategies:
Code Block:
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
while True:
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class, 'state-active')]//following::li[1]/a[#href]"))).click()
print("Clicked for next page")
WebDriverWait(driver, 10).until(EC.staleness_of(driver.find_element_by_xpath("//a[contains(#class, 'state-active')]//following::li[1]/a[#href]")))
except (TimeoutException):
print("No more pages")
break
driver.quit()
Console Output:
Clicked for next page
No more pages
I've created a script in python using selenium to click on a like button available in a webpage. I've used xpath in order to locate that button and I think I've used it correctly. However, the script doesn't seem to find that button and as a results it throws TimeoutException error pointing at the very line containing the xpath.
As it is not possible to hit that like button without logging in, I expect the script to get the relevant html connected to that button so that I understand I could locate it correctly.
I've tried with:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.instagram.com/p/CBi_eIuAwbG/"
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver,10)
driver.get(link)
item = wait.until(EC.visibility_of_element_located((By.XPATH,"//button[./svg[#aria-label='Like']]")))
print(item.get_attribute("innerHTML"))
How can I locate that like button visible as heart symbol using selenium?
To click on Like Button induce WebDriverWait() and wait for visibility_of_element_located() and below xpath.
Then scroll the element into view and click.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver.get("https://www.instagram.com/p/CBi_eIuAwbG/")
element=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//button[.//*[name()='svg' and #aria-label='Like']]")))
element.location_once_scrolled_into_view
element.click()
You can do it like this
likeSVG = driver.find_element(By.CSS_SELECTOR, 'svg[aria-label="Like"]')
likeBtn = likeSVG.find_element(By.XPATH, './..')
likeBtn.click()
likeBtn is equal to the parent of the likeSVG div as you can use XPATH similar to file navigation commands in a CLI.
Try using the .find_element_by_xpath(xPath) method (Uses full xpath):
likeXPATH = "/html/body/div[1]/section/main/div/div[1]/article/div[2]/section[1]/span[1]/button"
likeElement = driver.find_element_by_xpath(likeXPATH)
likeElement.click()
I am trying to web scrape Instagram using Python and Selenium. I have had many issues regarding locating the elements but somehow managed to pull through when I tried enough xpaths. But when I try to web scrape Donald Trump's following list (I want this to work for ANY USER'S following list/page), it just doesn't work. Here is the error it's throwing:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="f3b066159b38864"]/div/div/a"}
I get the xpaths by right clicking on the element using Google Chrome's inspect feature. If anyone needs me to post the full code I'd be happy to do so.
Try below xpath::
wait = WebDriverWait(driver, 20)
element = wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(),'laraleatrump')]")))
Note : please add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
Working solution :
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
driver.get("https://www.instagram.com/realdonaldtrump/")
driver.maximize_window()
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(.,'following')]"))).click()
peoples = wait.until(
EC.visibility_of_all_elements_located((By.XPATH, "//div[#role='dialog']//div[contains(#class,'PZuss')]//a")))
for peoplename in peoples:
print peoplename.text
why is this code showing NoSuchElementException error? I checked Chrome DOM my XPATH able to find the destinated tag.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
class Firefox():
def test(self):
base_url='https://oakliquorcabinet.com/'
driver = webdriver.Chrome(executable_path=r'C:\Users\Vicky\Downloads\chromedriver')
driver.get(base_url)
search=driver.find_element(By.XPATH,'//div[#class="box-footer"]/button[2]')
search.click()
ff=Firefox()
ff.test()
Selenium by default waits for the DOM to load and tries to find the element. But, the confirmation pop up becomes visible after some time the main page is loaded.
Use explicit wait to fix this issue.
add these imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
change line in script:
search = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.XPATH, '//div[#class="box-footer"]/button[2]')))