Selenium Python extract text between Span

Selenium Python extract text between Span - python

I am trying to extract the text "Margaret Osbon" from HTML below via Python with Selenium. But I keep getting blank values when I print. I have tried get_attribute
Still getting blank values when I print
<div class="author-info hidden-md">
By (author)
<span itemprop="author" itemtype="http://schema.org/Person" itemscope="Margareta Osborn">
<a href="/author/Margareta-Osborn" itemprop="url">
<span itemprop="name">
Margareta Osborn</span>
</a>
</span>
</div>
Below is my code for Python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time"
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.bookdepository.com/")
keyword = "9781925324402"
Search = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="book-search-form"]/div[1]/input[1]'))
)
Search.clear()
Search.send_keys(keyword)
Search.send_keys(Keys.RETURN)
try:
authors = driver.find_element_by_xpath("//div[#class='author-info hidden-md']/span/a/span").text
print(authors)
driver.quit()
except:
authors = "Not Available"
print(authors)
driver.quit()

You need to call the .text method which is present in the Selenium Python binding.
.text is present for web element
authors = driver.find_element_by_xpath("//div[#class='author-info hidden-md']/span/a/span").text
print(authors)
or
authors = driver.find_element_by_xpath("//a[contains(#href,'/author/Margareta-Osborn')]").get_attribute('innerHTML')
print(authors)
Update 1 :
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.bookdepository.com/Rose-River-Margareta-Osborn/9781925324402")
authors = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.author-info.hidden-md span[itemprop='author'] span"))).text
print(authors)

You are missing ".text" to get the value and maybe because of that you are getting some junk value. I am thinking that you are receiving just a reference ID for that.
Using .text -
#Get Element using Xpath
element = //span[#itemprop='name']
#Fetch using the driver findElement
author = driver.find_element_by_xpath(element).text
#Print the text
print(author)
Using JavaScriptExecutor -
driver.execute_script('return arguments[0].innerText;', element)
Using Get Attribute -
driver.find_element_by_xpath(element).get_attribute('innerText')

To get the value from span. Use WebDriverWait() and wait for visibility_of_element_located() and following css selector.
and use either .text or .get_attribute("textContent"))
driver.get('https://www.bookdepository.com/Rose-River-Margareta-Osborn/9781925324402')
print(WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.author-info.hidden-md [ itemprop="author"]'))).text)
print(WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.author-info.hidden-md [ itemprop="author"]'))).get_attribute("textContent"))
you need to import below libraries.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Related

Finding value using Xpath (with no unique identifiers) in Python Selenium

I'm having trouble trying to get the colour of a vehicle in Selenium using Python. I've checked YouTube, stackoverflow and all the usual resources but can't seem to find an answer that makes sense (I'm relatively new to Python and Selenium). I'm currently undertaking a project to automate the fetching of vehicle colour from the gov.uk website into an excel sheet based on the Vehicle Registration number already present on the spreadsheet. The code isn't finished yet, as I want to get over this Xpath hurdle first!
I need to fetch the 'Blue' value from this code:
<dl class="summary-no-action">
<div class="govuk-summary-list__row">
<dt>Registration number</dt>
<dd>
<div class="reg-mark-sm">WJ06 HYF</div>
</dd>
</div>
<div class="govuk-summary-list__row">
<dt>Make</dt>
<dd>VOLKSWAGEN</dd>
</div>
<div class="govuk-summary-list__row">
<dt>Colour</dt>
<dd>BLUE</dd>
</div>
</dl>
However, as you can see, they have made it very difficult as there is no specific ID, class, tag name, etc to work with so I'm assuming Xpath is my only option? could anyone help me as to the best implementation of this? My assumption is to find the first 'dd' tag underneath 'Colour' but I don't know how to write this!
Here is the code snippet I'm working on that I have so far:
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_element(By.LINK_TEXT, "Colour")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()
I'm aware the line 'div = main.find_element(By.LINK_TEXT, "Colour")' is incorrect, but I need to replace it with something so that I may fetch the colour present in the 'dd' tag underneath.
This is what I had originally, but it brings back all the values in the "summary-no-action" class name:
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_elements(By.CLASS_NAME, "govuk-summary-list__row")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()
Any help would be appreciated!
EDIT:
For reference, here is the whole code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver =
webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://www.google.com")
driver.get ("https://vehicleenquiry.service.gov.uk/")
time.sleep(5)
search = driver.find_element(By.ID ,
"wizard_vehicle_enquiry_capture_vrn_vrn")
search.send_keys("wj06hyf")
search.send_keys(Keys.RETURN)
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_elements(By.CLASS_NAME, "govuk-summary-list__row")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()

Use the following xpath to get the value BLUE. first identify the dt tag with text colour and following dd tag
//dt[text()='Colour']/following::dd[1]
code:
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Colour']/following::dd[1]"))).text)

To fetch the text Blue you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and get_attribute("innerHTML"):
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "dl.summary-no-action div:last-child dd"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dl[#class='summary-no-action']//div[#class='govuk-summary-list__row']/dt[text()='Colour']//following-sibling::dd[1]"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Get the element by title inside repeatable class in Selenium Webdriver in Python

I want to click on an element which contains class and title in selenium python.
A webpage contains repeatable class without any id but with unique name.
I want to detect and click on this title 'PaymateSolutions' once its loads in the page.
Below is the html tag. I tried many ways but I am ending up with errors.
Fyi I cant use the find element by class as they are not unique.
<div class="MuiGrid-root MuiGrid-item" title="PaymateSolutions">
<p class="MuiTypography-root jss5152 MuiTypography-body1">PaymateSolutions</p>
</div>
Few approaches that i tried to get driver element based on title using XPATH
Approach 1:-
wait = WebDriverWait(driver, 20)
element = wait.until(EC.element_to_be_clickable((By.XPATH,"//class[#title='PaymateSolutions']")))
Approach 2:-
element2 = (WebDriverWait(driver, 30).until(
EC.visibility_of_element_located((By.XPATH, "//p[#title='PaymateSolutions']")))
)
Approach 3:-
element2 = (WebDriverWait(driver, 30).until(
EC.visibility_of_element_located((By.XPATH, "//[#title='PaymateSolutions']")))
)
Can someone please help here?

For Approach 1 - title is the attribute of div tag. So the Xpath would be something like below:
//div[#title='PaymateSolutions']
For Approach 2 - p tag has no title attribute. PaymateSolutions is the text of the p tag. Xpath should be something like this:
//p[text()='PaymateSolutions']
For Approach 3 - There is no Tag Name in the xpath. Xpath would be:
//*[#title='PaymateSolutions']
Or
//div[#title='PaymateSolutions']
Links to refer - Link1, Link2
We can apply Explicit waits like below:
# Imports required for Explicit waits:
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver.get(url)
wait = WebDriverWait(driver,30)
payment_option = wait.until(EC.element_to_be_clickable((By.XPATH,"xpath for PaymateSolutions option")))
payment_option.click()
Link to refer for the Explicit waits - Link

All the XPath that you've been trying seems a bit wrong. Please use the below XPath :
//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']
Code trial 1:
time.sleep(5)
driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']").click()
Code trial 2:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']"))).click()
Code trial 3:
time.sleep(5)
button = driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']")
driver.execute_script("arguments[0].click();", button)
Code trial 4:
time.sleep(5)
button = driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']")
ActionChains(driver).move_to_element(button).click().perform()
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

How can I print email address using Selenium Python

<div id="MainCopy_ctl13_presentJob_EmailAddressPanel">
<a id="MainCopy_ctl13_presentJob_EmailAddress" href="mailto:dburse#bjcta.org">xyzmmm#tccp.org</a>
</div>
I have tried using
email = browser.find_elements_by_xpath('//div[#id="MainCopy_ctl13_presentJob_EmailAddress"]//a').text
print(email)
But I'm not getting a result.

The email inside the a tag is the href of the a tag so just do this:
Using Selenium:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
a_tag = driver.find_element_by_id('MainCopy_ctl13_presentJob_EmailAddress')
mail_link = a_tag.get_attribute("href")
mail_addrs = mail_link.split(':')[1]
print(mail_addrs)
Using Beautifulsoup:
from bs4 import BeautifulSoup
content="""
<div id="MainCopy_ctl13_presentJob_EmailAddressPanel">
a id="MainCopy_ctl13_presentJob_EmailAddress" href="mailto:dburse#bjcta.org">xyzmmm#tccp.org</a>
</div>"""
soup = BeautifulSoup(content)
a_tag = soup.find(id='MainCopy_ctl13_presentJob_EmailAddress')
mail_link = a_tag.attrs['href']
mail_addrs = mail_link.split(':')[1]
print(mail_addrs)

text print only visible text use textContent attribute for text not in display port:
email = browser.find_element_by_xpath('//div[#id="MainCopy_ctl13_presentJob_EmailAddressPanel"]//a').get_attribute("textContent")
print(email)

is the element already there? or perhaps code executed before the element is loaded by Selenium?
consider using wait :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()

The id attribute which you have used i.e. MainCopy_ctl13_presentJob_EmailAddress belongs to the <a> tag instead of the <div>
To print the email address you can use either of the following Locator Strategies:
Using css_selector and get_attribute():
print(driver.find_element(By.CSS_SELECTOR, "a#MainCopy_ctl13_presentJob_EmailAddress").get_attribute("innerHTML"))
Using xpath and text attribute:
print(driver.find_element(By.XPATH, "//a[#id='MainCopy_ctl13_presentJob_EmailAddress']").text)
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a#MainCopy_ctl13_presentJob_EmailAddress"))).text)
Using XPATH and get_attribute():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#id='MainCopy_ctl13_presentJob_EmailAddress']"))).get_attribute("innerHTML"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium can't get the text from element

I can't get the text from the element. I think it is a dynamically added text (from Angular) to the element and therefore not loaded directly in the element. The text inside the element is in the format of e.g. "3" with citation marks around ut.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import xlsxwriter
import re
pattern = r"[\"\d{1, 2}\"]"
PATH = "C:\Program Files (x86)\chromedriver.exe"
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
driver = webdriver.Chrome(PATH, chrome_options=chrome_options)
driver.get("some-url")
xpathPain = "/html/body/div[2]/div/div/div[1]/div/div/div[1]/div[3]/div/div/div[1]/div[3]/development-numbers/status-numbers/div/div[2]/div/h4"
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, xpathPain)))
elementPain = driver.find_element_by_xpath(xpathPain)
print(elementPain.text)
except TimeoutException:
print("Failed to load elementPain")
I get the output: (blank , like an empty string)
. I have tried to wait til the text is loaded with the EC text_to_be_present_in_element(locator, text_) and tried to use a regular expression for the text part.
The page source for the element is:
<h4 class="status-numbers__number">
"6"
<!---->
</h4>
So how do I get the number 6 from this element?
I have tried print(elementPain.get_attribute("innerHTML")) and that gets the "<!---->" part of the text but not the '"6"' part. I have also tried .getAttribute("innerText"), .getAttribute("textContent").
I have tried using the firefox geckodriver instead as well. No result.
I have managed to solve the issue using Firefox and this code:
try:
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH, xpathPain)))
elementPain = driver.find_element_by_xpath(xpathPain)
print(elementPain.get_attribute("innerHTML"))
Don't know it it had to do with the element out of viewport.

I have managed to solve the issue using Firefox and this code:
try:
element = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH, xpathPain)))
elementPain = driver.find_element_by_xpath(xpathPain)
print(elementPain.get_attribute("innerHTML"))
Don't know it it had to do with the element out of viewport.

Use the following XPath to identify the element.
You can use element.text or element.get_attribute("textContent") to get the text.
try:
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//h4[#class='status-numbers__number']")))
elementPain = driver.find_element_by_xpath("//h4[#class='status-numbers__number']")
print(elementPain.text) #To get the text using text
print(elementPain.get_attribute("textContent")) #To get the text using get_attribute()
except TimeoutException:
print("Failed to load elementPain")

Selenium - 'None' value from element

I wan't to get the value or the price of a stock from a trading website. The problem is, that when i'm using the .get attribute method like this:
.get_attribute('')
I can't seem to find anything to put in between the '' that will give me the value of the stock
Here is an image of the line when using inspect:
<span _ngcontent-c31="" class="price__value" style="" xpath="1"> 187.510 </span>
This is the code below that i've been making for this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
browser = webdriver.Chrome('/Users/ludvighenriksen/downloads/chromedriver')
browser.get('https://www.forex.com/en-uk/account-login/')
username_elem = browser.find_element_by_name('Username')
username_elem.send_keys('kebababdulaziz#gmail.com')
password_elem = browser.find_element_by_name('Password')
password_elem.send_keys('KEbababdulaziz')
password_elem.send_keys(Keys.ENTER)
time.sleep(5)
search_elem = WebDriverWait(browser, 20).until(EC.element_to_be_clickable(
(By.CSS_SELECTOR, "input.market-search__search-input")))
search_elem.click()
search_elem.send_keys('FB')
search_click_elem = WebDriverWait(browser, 20).until(EC.element_to_be_clickable(
(By.XPATH, "//app-market-table[#class='search-results-element ng-star-inserted']//div[#class='price--buy clickable-price arrows-flashing']")))
browser.execute_script("arguments[0].click();", search_click_elem)
price_elem = browser.find_element_by_css_selector("div.mercury:nth-child(2) div.mercury__body:nth-child(4) div.mercury__body-content-container app-workspace.ng-star-inserted:nth-child(3) div.panel-container:nth-child(1) app-workspace-panel.active.ng-star-inserted div.workspace-panel-content.workspace-panel-content--no-scroll-vertical.workspace-panel-content--no-scroll-horizontal.workspace-panel-content--auto-size div.workspace-panel-content__component.workspace-panel-content__component--auto-size app-deal-ticket.ng-star-inserted form.ticket-form.ng-untouched.ng-pristine.ng-invalid.ng-star-inserted div.market-prices app-market-prices.main-prices.ng-untouched.ng-pristine.ng-valid div.market-prices div.market-prices__direction label.buy.selected span.price.ng-star-inserted:nth-child(2) > span.price__value")
price_value = price_elem.get_attribute('value')
print(price_value)
The ('value') isn't working which makes sense i guess, but I think i've tried all that i could think of - and it prints out none.
The log in to the website is included if you want to try it out, thanks in Advance

If you want to access the content of some tag, you could use the .text option.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium Python extract text between Span - python

Related

Finding value using Xpath (with no unique identifiers) in Python Selenium

Get the element by title inside repeatable class in Selenium Webdriver in Python

How can I print email address using Selenium Python

Selenium can't get the text from element

Selenium - 'None' value from element

Categories

Resources