Scrape data from dynamic table using Python & Selenium - python

I'm trying to scrape data from this URL : https://qmjhldraft.rinknet.com/results.htm?year=2018, but I can't seem to be even able to scrape one single name from the dynamic table.
Here's the code that I currently have :
from selenium import webdriver
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')
element = driver.find_element_by_xpath('//*[#id="ht-results-table"]/tbody[1]/tr[2]/td[4]').text
print(element)
The code gives me this error :
NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="ht-results-table"]/tbody[1]/tr[2]/td[4]"}
There's obviously something wrong with my XPath, but I can't figure out what.
Thanks a lot for the help!

the first problem is as if the website is loading dynamically you need to give some time to load the page fully. to solve it you can use this
time.sleep(2)
// change the number according to your need.
element = driver.find_element_by_xpath('//*[#id="ht-results-table"]/tbody[1]/tr[2]/td[4]').text
the best way is using Explicit Waits. this will wait for the element to load then execute the next step.
2nd problem is you shouldn't just copy the XPath from the chrome dev tools
to get all the names you can do this
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')
try:
elements = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//tr[#rnid]/td[3]"))
)
finally:
names = driver.find_elements_by_xpath('//tr[#rnid]/td[3]')
for name in names:
nm = name.text
print(nm)
driver.quit()

Related

How do I resolve this error in Selenium : ERROR: Couldn't read tbsCertificate as SEQUENCE, ERROR: Failed parsing Certificate

I'm trying to execute a selenium program in Python to go to a new URL on click of a button in the current homepage. I'm new to selenium and any help regarding this would be appreciated. Here's my code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://nmit.ac.in'
driver = webdriver.Chrome()
driver.get(url)
try:
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(By.LINK_TEXT, "Parent Portal")
)
except:
print()
driver.find_element(By.LINK_TEXT, "Parent Portal").click()
I have tried to increase the wait time as well as using all forms of the supported located strategies under the BY keyword, but to no avail. I keep getting this error.
As far as I know, you shouldn't be worried by those errors. Although, I proved your code and it's not finding any element in the web page. I can recommend you use the xpath of the element you want to find:
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html//div[#class='wsmenucontainer']//button[#href='https://nmitparents.contineo.in/']"))
)
element.click()
#wait for the new view to be loaded
time.sleep(10)
driver.quit()
Psdt: you can use the extention Ranorex Selocity to extract a good and unique xpath of any element in a webpage and also test it!!
image

Do not find the element in the webpage

Can not find the element
The code was writen using python with visual studio code
from time import time
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
paginaHit = 'https://hit.com.do/solicitud-de-verificacion/'
driver.get(paginaHit)
driver.maximize_window()
time.sleep(5)
bl = 'SMLU7318830A'
elementoBL = driver.find_element(By.XPATH, '//*[#id="billoflanding"]').send_keys(bl)
# WebDriverWait(driver,2).until(EC.element_to_be_clickable((By.NAME, "bl"))).Click()
The code is OK, but can not find the element in the webpage.
The portion of the page you are trying to access is inside an EMBED tag. It looks similar to an IFRAME so I would start by switching the context to the EMBED tag and then try searching for the element.
driver = webdriver.Chrome()
paginaHit = 'https://hit.com.do/solicitud-de-verificacion/'
driver.get(paginaHit)
driver.maximize_window()
embed = driver.find_element(By.CSS_SELECTOR, "embed")
driver.switch_to.frame(embed)
bl = 'SMLU7318830A'
wait =WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.ID, "billoflanding"))).send_keys(bl)
Couple of additional points:
Don't use sleeps... sleeps are a bad practice. Instead use WebDriverWait when you need to wait for something to happen.
If you are using an ID to find an element, use By.ID and not XPath. ID should be preferred, when available. Next should be a CSS selector and then finally, XPATH only when needed, e.g. to locate elements by contained text or to do complicated DOM traversal.

Scraping flex-element Selenium Python

I am trying to scrape some tennis statistics starting from 01-01-2019.
For this I try to scrape the following webpage with selenium: https://www.sofascore.com/de/tennis/2019-01-01
When I click on the first match manually the container on the right side changes and shows the statistics.
This is what I want to access automatically.
When I try to click on the element with selenium it redirects me to another page.
Can anyone tell me why it is not just showing the same content as by manually clicking and how I can solve this issue?
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
import time
options = Options()
options.binary_location = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
browser = webdriver.Chrome(chrome_options = options)
url = 'https://www.sofascore.com/de/tennis/2019-01-01'
browser.get(url)
browser.maximize_window()
xpath = '/html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div'
browser.find_element_by_xpath(xpath).click()
time.sleep(2)
browser.close()`
You can use the below xpath :
//div[contains(#class, 'Col-pm5mcz-')]//descendant::div[contains(#class, 'styles__StyledWidget-')]
and get the innerHTML of that using get_attribute method
Code :
url = "https://www.sofascore.com/de/tennis/2019-01-01"
driver.get(url)
xpath = '/html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div'
driver.find_element_by_xpath(xpath).click()
sleep(2)
details = driver.find_element_by_xpath("//div[contains(#class, 'Col-pm5mcz-')]//descendant::div[contains(#class, 'styles__StyledWidget-')]").get_attribute('innerHTML')
print(details)
The xpath that you are using is absolute xpath /html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div
try to replace that with Relative xpath.
See if this works
tableRows = driver.find_elements_by_xpath(".//div[#class='ReactVirtualized__Grid ReactVirtualized__List']//following::div/a[contains(#class,'EventCellstyles__Link')]")
for e in tableRows:
e.click()
//You can add implicit wait here for the statics section to load
driver.find_element_by_xpath(".//a[text()='Statistiken']").click()

Selenium is returning empty text for elements that definitely have text

I'm practicing trying to scrape my university's course catalog. I have a few lines in Python that open the url in Chrome and clicks the search button to bring up the course catalog. When I go to extract the texting using find_elements_by_xpath(), it returns blank. When I use the dev tools on Chrome, there definitely is text there.
from selenium import webdriver
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]')
print(course)
I'm trying to extract the text from the element 'OSU_CAT_SRCH_OSR_CRSE_HEADER'. I don't understand why it's not returning the text values especially when I can see that it contains text with dev tools.
You are not using text that is the reason you are not getting the text.
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]').text
Try above changes in last second line
Below is the full code after the changes
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
# wait 10 seconds
course = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]'))
).text
print(course)

click on element with classname selenium

I am trying to scrape the opening hours of bars from a website. There is a list of bars which then if you navigate to you the opening hours are available. I am having an issue clicking on an element when it has a class name.
I have written the code to get the hours from one venuw, however, I am unable to navigate to each venue from the first link.
This code works when I get the hours for one venue
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.designmynight.com/london/bars/soho/six-storeys')
hours = driver.find_element_by_xpath('//li[#id="hours"]')
hours.click()
hoursTable = driver.find_elements_by_css_selector("table.opening-times tr")
for row in hoursTable:
print(row.text)
The issue is when I try to navigate to this page to I am unable to click into each link. Can anyone see what I am doing wrong?
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.designmynight.com/london/search-results#!?type_of_venue=512b2019d5d190d2978c9ea9&type_of_venue=512b2019d5d190d2978c9ea8&type_of_venue=512b2019d5d190d2978c9ead&type_of_venue=512b2019d5d190d2978c9eaa&type_of_venue=512b2019d5d190d2978c9eab&type=&q=&radius=')
venue = driver.find_element_by_xpath('//a[#class="ng-binding"]')
venue.click()
//this should then lead me to the following link ->
driver.get('https://www.designmynight.com/london/bars/soho/six-storeys')
hours = driver.find_element_by_xpath('//li[#id="hours"]')
hours.click()
hoursTable = driver.find_elements_by_css_selector("table.opening-times tr")
for row in hoursTable:
print(row.text)
All links with ng-binding class names are generated dynamically, so you need to wait untill link appears in DOM and it's also clickable:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.designmynight.com/london/search-results#!?type_of_venue=512b2019d5d190d2978c9ea9&type_of_venue=512b2019d5d190d2978c9ea8&type_of_venue=512b2019d5d190d2978c9ead&type_of_venue=512b2019d5d190d2978c9eaa&type_of_venue=512b2019d5d190d2978c9eab&type=&q=&radius=')
venue = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//a[#class="ng-binding"]')))
venue.click()
But if you want to follow each link I'd suggest you not to click those links, but get the list of references and then open each one as below:
xpath = '//a[#class="ng-binding"]'
wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath)))
links = [venue.get_attribute('href') for venue in driver.find_elements_by_xpath(xpath)]
for link in links:
driver.get(link)
hours = driver.find_element_by_xpath('//li[#id="hours"]')
hours.click()
hoursTable = driver.find_elements_by_css_selector("table.opening-times tr")
for row in hoursTable:
print(row.text)
I feel the issue is that the driver is trying to locate the element before it is available in the DOM. Try waiting until the element is present, instead of directly trying to find it.
You could replace this line:
venue = driver.find_element_by_xpath('//a[#class="ng-binding"]')
with:
venue = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//a[#class="ng-binding"]')))
Note that you'll need to do a few imports to do this:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Source : http://selenium-python.readthedocs.io/waits.html

Categories