Finding value using Xpath (with no unique identifiers) in Python Selenium

Finding value using Xpath (with no unique identifiers) in Python Selenium - python

I'm having trouble trying to get the colour of a vehicle in Selenium using Python. I've checked YouTube, stackoverflow and all the usual resources but can't seem to find an answer that makes sense (I'm relatively new to Python and Selenium). I'm currently undertaking a project to automate the fetching of vehicle colour from the gov.uk website into an excel sheet based on the Vehicle Registration number already present on the spreadsheet. The code isn't finished yet, as I want to get over this Xpath hurdle first!
I need to fetch the 'Blue' value from this code:
<dl class="summary-no-action">
<div class="govuk-summary-list__row">
<dt>Registration number</dt>
<dd>
<div class="reg-mark-sm">WJ06 HYF</div>
</dd>
</div>
<div class="govuk-summary-list__row">
<dt>Make</dt>
<dd>VOLKSWAGEN</dd>
</div>
<div class="govuk-summary-list__row">
<dt>Colour</dt>
<dd>BLUE</dd>
</div>
</dl>
However, as you can see, they have made it very difficult as there is no specific ID, class, tag name, etc to work with so I'm assuming Xpath is my only option? could anyone help me as to the best implementation of this? My assumption is to find the first 'dd' tag underneath 'Colour' but I don't know how to write this!
Here is the code snippet I'm working on that I have so far:
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_element(By.LINK_TEXT, "Colour")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()
I'm aware the line 'div = main.find_element(By.LINK_TEXT, "Colour")' is incorrect, but I need to replace it with something so that I may fetch the colour present in the 'dd' tag underneath.
This is what I had originally, but it brings back all the values in the "summary-no-action" class name:
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_elements(By.CLASS_NAME, "govuk-summary-list__row")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()
Any help would be appreciated!
EDIT:
For reference, here is the whole code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver =
webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://www.google.com")
driver.get ("https://vehicleenquiry.service.gov.uk/")
time.sleep(5)
search = driver.find_element(By.ID ,
"wizard_vehicle_enquiry_capture_vrn_vrn")
search.send_keys("wj06hyf")
search.send_keys(Keys.RETURN)
try:
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "summary-no-action"))
)
div = main.find_elements(By.CLASS_NAME, "govuk-summary-list__row")
for article in div:
header = article.find_element(By.TAG_NAME, "dd")
print(header.text)
finally:
driver.quit()

Use the following xpath to get the value BLUE. first identify the dt tag with text colour and following dd tag
//dt[text()='Colour']/following::dd[1]
code:
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Colour']/following::dd[1]"))).text)

To fetch the text Blue you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and get_attribute("innerHTML"):
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "dl.summary-no-action div:last-child dd"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dl[#class='summary-no-action']//div[#class='govuk-summary-list__row']/dt[text()='Colour']//following-sibling::dd[1]"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

Unable to find element that looks like it's dynamically generated

I'm trying to find the email address text located in the below URL. You can clearly see the email address but I believe the text is generated dynamically through Javascript/React. When I copy the XPATH or CSS Selector and try to find the element like I would any other element I just get an error saying the element cannot be found.
I've tried time.sleep(30) to give the page time to fully load but that's not the issue.
I've tried:
driver.find_element(By.XPATH, '//*[#id="mount_0_0_D8"]/div/div[1]/div/div[5]/div/div/div[3]/div/div/div[1]/div[1]/div/div/div[4]/div[2]/div/div[1]/div[2]/div/div[1]/div/div/div/div/div[2]/div[2]/div/ul/div[2]/div[2]/div/div/span')
You can see from the snippet below that the email is visible but is between some ::before and ::after text I've not seen before.
https://www.facebook.com/homieestudio
Any ideas on how to consistently pull back the email address here? I'm using Chromedriver.

The clue is, the Email Address will always have the # sign.
Solution
To extract the text Info#homieestudio.co.uk ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH and text attribute:
driver.get('https://www.facebook.com/homieestudio')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., '#')]"))).text)
Using XPATH and get_attribute("textContent"):
driver.get('https://www.facebook.com/homieestudio')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., '#')]"))).get_attribute("textContent"))
Console output:
Info#homieestudio.co.uk
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

You can find the email address using the below
wait = WebDriverWait(driver, 20)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[#dir='auto'][contains(text(), 'info#homieestudio.co.uk')]")))
OR
wait = WebDriverWait(driver, 20)
element = wait.until(EC.presence_of_element_located((By.XPATH, "//span[#dir='auto'][contains(text(), 'info#homieestudio.co.uk')]")))
IMPORT
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Get the element by title inside repeatable class in Selenium Webdriver in Python

I want to click on an element which contains class and title in selenium python.
A webpage contains repeatable class without any id but with unique name.
I want to detect and click on this title 'PaymateSolutions' once its loads in the page.
Below is the html tag. I tried many ways but I am ending up with errors.
Fyi I cant use the find element by class as they are not unique.
<div class="MuiGrid-root MuiGrid-item" title="PaymateSolutions">
<p class="MuiTypography-root jss5152 MuiTypography-body1">PaymateSolutions</p>
</div>
Few approaches that i tried to get driver element based on title using XPATH
Approach 1:-
wait = WebDriverWait(driver, 20)
element = wait.until(EC.element_to_be_clickable((By.XPATH,"//class[#title='PaymateSolutions']")))
Approach 2:-
element2 = (WebDriverWait(driver, 30).until(
EC.visibility_of_element_located((By.XPATH, "//p[#title='PaymateSolutions']")))
)
Approach 3:-
element2 = (WebDriverWait(driver, 30).until(
EC.visibility_of_element_located((By.XPATH, "//[#title='PaymateSolutions']")))
)
Can someone please help here?

For Approach 1 - title is the attribute of div tag. So the Xpath would be something like below:
//div[#title='PaymateSolutions']
For Approach 2 - p tag has no title attribute. PaymateSolutions is the text of the p tag. Xpath should be something like this:
//p[text()='PaymateSolutions']
For Approach 3 - There is no Tag Name in the xpath. Xpath would be:
//*[#title='PaymateSolutions']
Or
//div[#title='PaymateSolutions']
Links to refer - Link1, Link2
We can apply Explicit waits like below:
# Imports required for Explicit waits:
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver.get(url)
wait = WebDriverWait(driver,30)
payment_option = wait.until(EC.element_to_be_clickable((By.XPATH,"xpath for PaymateSolutions option")))
payment_option.click()
Link to refer for the Explicit waits - Link

All the XPath that you've been trying seems a bit wrong. Please use the below XPath :
//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']
Code trial 1:
time.sleep(5)
driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']").click()
Code trial 2:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']"))).click()
Code trial 3:
time.sleep(5)
button = driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']")
driver.execute_script("arguments[0].click();", button)
Code trial 4:
time.sleep(5)
button = driver.find_element_by_xpath("//div[#title='PaymateSolutions']//p[text()='PaymateSolutions']")
ActionChains(driver).move_to_element(button).click().perform()
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

Selenium Python extract text between Span

I am trying to extract the text "Margaret Osbon" from HTML below via Python with Selenium. But I keep getting blank values when I print. I have tried get_attribute
Still getting blank values when I print
<div class="author-info hidden-md">
By (author)
<span itemprop="author" itemtype="http://schema.org/Person" itemscope="Margareta Osborn">
<a href="/author/Margareta-Osborn" itemprop="url">
<span itemprop="name">
Margareta Osborn</span>
</a>
</span>
</div>
Below is my code for Python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time"
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.bookdepository.com/")
keyword = "9781925324402"
Search = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="book-search-form"]/div[1]/input[1]'))
)
Search.clear()
Search.send_keys(keyword)
Search.send_keys(Keys.RETURN)
try:
authors = driver.find_element_by_xpath("//div[#class='author-info hidden-md']/span/a/span").text
print(authors)
driver.quit()
except:
authors = "Not Available"
print(authors)
driver.quit()

You need to call the .text method which is present in the Selenium Python binding.
.text is present for web element
authors = driver.find_element_by_xpath("//div[#class='author-info hidden-md']/span/a/span").text
print(authors)
or
authors = driver.find_element_by_xpath("//a[contains(#href,'/author/Margareta-Osborn')]").get_attribute('innerHTML')
print(authors)
Update 1 :
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.bookdepository.com/Rose-River-Margareta-Osborn/9781925324402")
authors = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.author-info.hidden-md span[itemprop='author'] span"))).text
print(authors)

You are missing ".text" to get the value and maybe because of that you are getting some junk value. I am thinking that you are receiving just a reference ID for that.
Using .text -
#Get Element using Xpath
element = //span[#itemprop='name']
#Fetch using the driver findElement
author = driver.find_element_by_xpath(element).text
#Print the text
print(author)
Using JavaScriptExecutor -
driver.execute_script('return arguments[0].innerText;', element)
Using Get Attribute -
driver.find_element_by_xpath(element).get_attribute('innerText')

To get the value from span. Use WebDriverWait() and wait for visibility_of_element_located() and following css selector.
and use either .text or .get_attribute("textContent"))
driver.get('https://www.bookdepository.com/Rose-River-Margareta-Osborn/9781925324402')
print(WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.author-info.hidden-md [ itemprop="author"]'))).text)
print(WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.author-info.hidden-md [ itemprop="author"]'))).get_attribute("textContent"))
you need to import below libraries.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

How to extract the hashtags from twitter page explore?

I want to extract the tag names (hashtags) from the explore page in twitter using selenium on python3. But there are no special tags or classes or even ids to be able to locate them and save them.
Is there a way that I can extract them even if they change without having to edit my code every time?
I think the following code will take me to the explore page using the link text. But I can not use the same method to locate the tags as they change every now and then.
explore = driver.find_element_by_link_text("Explore")
I want to be able to locate the tags and save them into a list so I can use that list in my work later on.
This is the html code for on of the tags:
<span class="r-18u37iz"><span dir="ltr" class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0">#ARSBUR</span></span>
The classes are not unique and they are used in other elements of the page, so I can not use them.
If there is a way to locate the (#) mark so I can only get the text that includes them.

To extract the hashtags from the explore page in twitter i.e https://twitter.com/explorer?lang=en using Selenium on Python 3 you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://twitter.com/explorer?lang=en")
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href^='/hashtag']>span.trend-name")))])
Using XPATH:
driver.get("https://twitter.com/explorer?lang=en")
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(#href, '/hashtag')]/span[contains(#class, 'trend-name')]")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['#MCITOT', '#WorldSupportsKashmir', '#MCIvsTOT', '#11YearsOFViratism', '#ManCity']

You could dump page source into beautifulsoup 4.7.1 + and use :contains along with class. Your classes appear different from the ones I see but I am making an assumption about url.
N.B. On the page there can be other # under a different class which would make selector ".trend-name, .twitter-hashtag" .
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
d = webdriver.Chrome(r'path\chromedriver.exe')
d.get('https://twitter.com/explorer?lang=en')
WebDriverWait(d,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".trend-name")))
soup = bs(d.page_source, 'lxml')
hashtag_trends = [i.text for i in soup.select('.trend-name:contains("#")')]
print(hashtag_trends)
Or test whether .text begins with # for selenium only
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
d = webdriver.Chrome(r'path\chromedriver.exe')
d.get('https://twitter.com/explorer?lang=en')
hashtag_trends = [i.text for i in
WebDriverWait(d,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".trend-name")))
if i.text.startswith('#')
]

For locator trending topic you can using xpath.
driver.find_element(By.XPATH, '(//*[contains(#class,"trend-name")])[1]').text
driver.find_element(By.XPATH, '(//*[contains(#class,"trend-name")])[1]').click()
You can get count the element by :
len_locator = driver.find_elements(By.XPATH, '//*[contains(#class,"trend-name")]')
print len(len_locator)
Or if you only want locator only start with #, you can use :
driver.find_element(By.XPATH, '(//*[#dir="ltr" and starts-with(text(), "#")])[1]').text
driver.find_element(By.XPATH, '(//*[#dir="ltr" and starts-with(text(), "#")])[1]').click
You can get count the element by :
len_locator = driver.find_elements(By.XPATH, '//*[#dir="ltr" and starts-with(text(), "#")]')
print len(len_locator)
It's the first locator of the trending topic, if you want the second and so on, then replace [1] to [2] etc. Use iteration to the grab all.

selenium using python: how to correctly click() an element?

while learning how to use selenium, Im trying to click an element but nothing happens and Im unable to reach the next page. this is the relevant page: http://buyme.co.il and Im trying to click: הרשמה
I managed to print the desired element (הרשמה) so I guess Im reaching the correct place in the page. but 'click()' doesnt work.
the second span <span>הרשמה</span> is what i want to click:
<li data-ember-action="636">
<a>
<span class="seperator-link">כניסה</span>
<span>הרשמה</span>
</a>
</li>
for elem in driver.find_elements_by_xpath('//* [#id="ember591"]/div/ul[1]/li[3]/a/span[2]'):
print (elem.text)
elem.click()
also tried this:
driver.find_element_by_xpath('//*[#id="ember591"]/div/ul[1]/li[3]/a').click()
I expected to get to the "lightbox" which contain the registration fields.
Any thoughts on the best way to accomplish this?

Explicit Waits - An explicit wait is a code you define to wait for a certain condition to occur before proceeding further in the code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
browser.get("https://buyme.co.il/")
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.ID, 'ember591')))
elm = browser.find_elements_by_xpath('//div[#id="ember591"]/div/ul[1]/li[3]/a')
elm[0].click()
Update:
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'login')))
email = browser.find_elements_by_xpath("//form[#id='ember1005']/div[1]/label/input")
email[0].send_keys("abc#gmail.com")
password = browser.find_elements_by_xpath("//form[#id='ember1005']/div[2]/label/input")
password[0].send_keys("test1234567")
login = browser.find_elements_by_xpath('//form[#id="ember1005"]/button')
login[0].click()

The desired element is an Ember.js enabled element so to locate the element you have to induce WebDriverWait for the element to be clickable and you can use the following Locator Strategy:
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='הרשמה']"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding value using Xpath (with no unique identifiers) in Python Selenium - python

Use the following xpath to get the value BLUE. first identify the dt tag with text colour and following dd tag //dt[text()='Colour']/following::dd[1] code: print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Colour']/following::dd[1]"))).text)

Related

Unable to find element that looks like it's dynamically generated

Get the element by title inside repeatable class in Selenium Webdriver in Python

Selenium Python extract text between Span

How to extract the hashtags from twitter page explore?

selenium using python: how to correctly click() an element?

Categories

Resources