Find previous element using Selenium Python - python

I have HTML:
<div class="value">
<div style="float: right;">100.00</div>
<span class="yellow">Frequency:</span>
</div>
With selenium By.XPATH '//*[text()="Frequency:"]' I'm able to locate the <span> element. My goal is to get the innerText of the previous <div>.
Can I do it with selenium or should I use bs4 for this task?
I tried selenium.parent with documentation, but unable to do.
PS: I can't find the parent element or the div I need directly.

To print the text from the <div> tag i.e. 100.00 wrt the <span> you can use either of the following locator strategies:
Using xpath and get_attribute("innerHTML"):
print(driver.find_element(By.XPATH, "//span[text()='Frequency:']//preceding::div[1]").get_attribute("innerHTML"))
Using xpath and text attribute:
print(driver.find_element(By.XPATH, "//span[text()='Frequency:']//preceding::div[1]").text)

Related

Selenium Python: Extracting text from elements having no class

I am very new to web scraping. I am working on Selenium and want to perform the task to extract the texts from span tags. The tags do not have any class and ids. The span tags are inside the li tags. I need to extract the text from a span tags that are inside of the li tags. I don't know how to do that. Could you please help me with that?
HTML of the elements:
<div class="cmeStaticMediaBox cmeComponent section">
<div>
<ul class="cmeList">
<li class="cmeListContent cmeContentGroup">
<ul class="cmeHorizontalList cmeListSeparator">
<li>
<!-- Default clicked -->
<span>VOI By Exchange</span>
</li>
<li>
<a href="https://www.cmegroup.com/market-data/volume-open-interest/agriculture-commodities-volume.html" class="none" target="_self">
<span>Agricultural</span></a>
</li>
<li>
<a href="https://www.cmegroup.com/market-data/volume-open-interest/energy-volume.html" class="none" target="_self">
<span>Energy</span></a>
</li>
</ul>
</li>
</ul>
</div>
</div>
The simplest way to do this is
for e in driver.find_elements(By.CSS_SELECTOR, "ul.cmeHorizontalList a")
print(e.text)
Some pitfalls in other answers...
You shouldn't use exceptions to control flow. It's just a bad practice and is slower.
You shouldn't use Copy > XPath from a browser. Most times this generates XPaths that are very brittle. Any XPath that starts at the HTML tag, has more than a few levels, or uses a number of indices (e.g. div[2] and the like) is going to be very brittle. Any even minor change to the page will break that locator.
Prefer CSS selectors over XPath. CSS selectors are better supported, faster, and the syntax is simpler.
EDIT
Since you need to use selenium, you can use XPATHs to locate elements when you don't have a tag on which you can refer to. From your favorite browser just F12, then right-click on the interested element and choose "Copy -> XPath". This is the solution proposed (I assume you have chrome and the chromedriver in the same folder of the .py file):
import os
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium import webdriver
url = "https://www.cmegroup.com/market-data/volume-open-interest/metals-volume.html"
i = 1
options = webdriver.ChromeOptions()
# this flag won't open a browser window, if you don't need the dev window uncomment this line
# options.add_argument("--headless")
driver = webdriver.Chrome(
options=options, executable_path=os.getcwd() + "/chromedriver.exe"
)
driver.get(url)
while True:
xpath = f"/html/body/div[1]/div[2]/div/div[2]/div[2]/div/ul/li/ul/li[{i}]/a/span"
try:
res = driver.find_element(By.XPATH, xpath)
except NoSuchElementException:
# There are no more span elements in li
break
print(res.text)
i += 1
Results:
VOI By Exchange
Agricultural
Energy
Equities
FX
Interest Rates
You can extend this snippet to handle the .csv download from each page.
OLD
If you are working with a static html page (like the one you provided in the question) I suggest you to use BeautifulSoup. Selenium is more suited if you have to click, fill forms or interact with a web page. Here's a snippet with my solution:
from bs4 import BeautifulSoup
html_doc = """
<div class="cmeStaticMediaBox cmeComponent section">
<div>
<ul class="cmeList">
<li class="cmeListContent cmeContentGroup">
<ul class="cmeHorizontalList cmeListSeparator">
<li>
<!-- Default clicked -->
<span>VOI By Exchange</span>
</li>
<li>
<a href="https://www.cmegroup.com/market-data/volume-open-interest/agriculture-commodities-volume.html"
class="none" target="_self">
<span>Agricultural</span></a>
</li>
<li>
<a href="https://www.cmegroup.com/market-data/volume-open-interest/energy-volume.html" class="none"
target="_self">
<span>Energy</span></a>
</li>
</ul>
</li>
</ul>
</div>
</div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
for span in soup.find_all("span"):
print(span.text)
And the result will be:
VOI By Exchange
Agricultural
Energy
To extract the desired texts e.g. VOI By Exchange, Agricultural, Energy, etc you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.cmegroup.com/market-data/volume-open-interest/exchange-volume.html')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.cmeHorizontalList.cmeListSeparator li span")))])
Using XPATH:
driver.get('https://www.cmegroup.com/market-data/volume-open-interest/exchange-volume.html')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#id='onetrust-accept-btn-handler']"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[#class='cmeHorizontalList cmeListSeparator']//li//span")))])
Console Output:
['VOI By Exchange', 'Agricultural', 'Energy', 'Equities', 'FX', 'Interest Rates', 'Metals']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

how to get div text with python selenium?

How can I get the text "950" from the div that has neither a ID nor a Class with python selenium?
<div class="player-hover-box" style="display: none;">
<div class="ps-price-hover">
<div><img class="price-platform-img-hover"></div>
<div>950</div>
</div>
I dont know how I could access this div and its text.
In case player-hover-box is an unique class name you can use the following command
price = driver.find_element_by_xpath('//div[#class="player-hover-box"]/div/div[2]').text
In case there are more products on that page with the similar HTML structure your XPath locator should contain some unique relation to some other element.

Select an element in Selenium, Python, not by using XPath

I was trying to scrape a website, and I need to select only the ul element inside the div with a class "Slider__SliderWrapper-sc-143uniy-0 jrPmnS", however, since there are many div tags with the same class, the only way I have to select just the ul I need is by looking at the href of the a tag, the one inside the h2.
I can't use xpath, because div tags always change position.
<div>
<h2><a class="slider-components__SectionLink-sc-1r2bduf-3 jchpWs" href="rightOne">Right!</a></h2>
<div class="Slider__SliderWrapper-sc-143uniy-0 jrPmnS">
<ul class="Slider__List-sc-143uniy-1 MTYOL">
the right ul
</ul>
</div>
</div>
<div>
<h2><a class="slider-components__SectionLink-sc-1r2bduf-3 jchpWs" href="wrongOne">Something else</a></h2>
<div class="Slider__SliderWrapper-sc-143uniy-0 jrPmnS">
<ul class="Slider__List-sc-143uniy-1 MTYOL">
the wrong ul
</ul>
</div>
</div>
I thought about using css selector but I don't know how to, any help?
You definitely CAN use xpath to access the href attribute AND it's contents:
//a[contains(#href,'rightOne')]
and for the ul:
//h2/a[contains(#href,'rightOne')]/../following-sibling::div/ul
try xpath
//a[#href='rightOne']/../following-sibling::div/ul
Explanation :
You cannot use css_selector or any other locator since you are depending on a tag and you have to traverse upwards in DOM first, we are using /.. for that, alternatively you can use /parent::h2 and the next following-sibling using /following-sibling::div and then finally ul child
You cannot get a parent element with css selector, as it's not possible. Check here Is there a CSS parent selector?
In your case you would need to get the parent of a[href=rightOne] and get the ul of the following sibling.
With css you could use one of these locators:
div:nth-child(1) .Slider__SliderWrapper-sc-143uniy-0.jrPmnS>.Slider__List-sc-143uniy-1.MTYOL
Or
div:nth-child(1) .Slider__SliderWrapper-sc-143uniy-0.jrPmnS>ul
I would select any of XPaths proposed in other two answers if there are not restrictions on selectors.
But, if you are using such libraries as BeautfulSoup, you will have to use css selectors, as it does not support XPath. So, use the ones I proposed.

How to find previous WebElements relative to a particular WebElement with selenium python

I have the below HTML snippet.
<div class="header">Planets</div>
<div class="event">Jupiter</div>
<div class="event">Mars</div>
<div class="header">Stars</div>
<div class="event">Acturus</div>
<div class="event">Pleaides</div>
Using driver.find_elements_by_class_name("event"), I am able to retrieve all the div tags with class "event".
I would want to navigate to the previous sibling and retrieve the div tag with class "header" for each WebElement.
Switch to by find_elements_by_xpath
driver.find_elements_by_xpath("//div[#class='event']/preceding-sibling::div[#class='header']")

Clicking button within span with no ID using Selenium

I'm trying to have click a button in the browser with Selenium and Python.
The button is within the following
<div id="generate">
<i class="fa fa-bolt"></i>
<span>Download Slides</span>
<div class="clear"></div>
</div>
Chrome's dev console tells me the button is within <span> but I have no idea how to reference the button for a .click().
Well, if you just want to click on an element without an id or name, I'd suggest three ways to do it:
use xpath:
driver.find_element_by_xpath('//*[#id="generate"]/span')
use CSS selector:
driver.find_element_by_css_selector('#generate > span')
Just try .find_element_by_tag_name() like:
driver.find_element_by_id('generate').find_elements_by_tag_name('span')[0]
Note that this way first try to get the generate <div> element by it's id, and then finds all the <span> elements under that <div>.
Finally, gets the first <span> element use [0].

Categories