I have a HTML element as follows:
<a class="country" href="/es-co">
Columbia
</a>
How do I select that anchor element based on the content 'Columbia'? I can't use find_element_by_class_css_selector because a.country represents half a dozen elements. How do I select that element and click it using Silenium with Python (through IE, if that has any bearing)?
As an aside, I could have any number of links with the same text and CSS selectors. How would Silenium differentiate?
There's no find_element_by_class_css_selector. But you are right, you can't use class names.
The best way is to use href="/es-co", if it's unique.
find_element_by_css_selector("a[href='/es-co']")
Otherwise you can find by text using XPath
find_element_by_xpath(".//a[contains(text(), 'Columbia')])
If you have many links with same locator, then you can index them, either by XPath directly or the list returned by Selenium.
For example, if you have ten Columbia
find_element_by_xpath(".//a[contains(text(), 'Columbia')][10]") # one-based index, one element only
find_elements_by_xpath(".//a[contains(text(), 'Columbia')]")[9] # find_elements_* gives you zero-base index list
In case of <a> with clickable text, Selenium provides API like find_with_link_text or find_with_partial_link_text (API name many be different but you got the idea).
If there are many <a> with same text/css-class, best bet to locate them is using XPath that is accepted by selenium APIs.
Related
I am trying to extract data from multiple pages of search results where the HTML in question looks like so:
<ul>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
</ul>
I want to extract the text from the "li" tags, so I have:
text_data = WebDriverWait(driver,10).until(EC.visibility_of_all_element_located((By.XPATH,'Card___StyledLi4-ulg8ho-7.jmevwM')
print(text_data.text)
to wait and target "li" item. However, I get a "TimeoutException" error.
However, if I try to locate a single "li" item using the XPATH under the same conditions, the data is returned which leads me to question if I am inputting the class correctly?
Can anyone tell me what I'm doing wrong? Please let me know if there is any further information, you'd like me to provide.
I believe the XPath for these list items would be //li[#class="Card___StyledLi4-ulg8ho-7 jmevwM"] (or //*[#class="Card___StyledLi4-ulg8ho-7 jmevwM"] if you want all elements with that class rather than just li tags). You can take a look at this cheatsheet and this tutorial for further rules and examples of XPath.
You can also just use CSS Selectors like (By.CSS_SELECTOR, '.Card___StyledLi4-ulg8ho-7.jmevwM') in this case.
You have mentioned the wrong locator type, it should be CSS_SELECTOR and also put a dot '.' in front of element's property, because it is a 'class':
text_data = WebDriverWait(driver,10).until(EC.visibility_of_all_element_located((By.CSS_SELECTOR,'.Card___StyledLi4-ulg8ho-7.jmevwM')
If I am implementing string locators, such as:
continue_button: str = "button:has-text(\"Continue\")"
If there are multiple buttons on the same page that say continue, but are for different paths, how do I select the correct continue... is there a way to add an index to that string locator?
There is several good practices for creating locators/selectors.
Using playwright there is official documentation for each common and unique selector on how-to and what-is doing.
More information in https://playwright.dev/docs/selectors#text-selector
About your case, i would suggest always to use an parent selector for locating an element.
When there is a button, try to find its unique parent.
By id
By unique class
Something else unique.
Example:
<dv id=test>
<button id=continue-test>Continue</button>
</div>
In this case you can use the unique id of the button and not the text.
Selector css: #continue-test
But if you, don't have an unique identifier for the button you can use the parent and go down to the button.
Selector css: #test > button
Matching text using css is not possible, but with XPATH can look like this:
//button[text()="Continue"]
This selector MATCHES the text using "equals".
Using playwright:
button:has-text("Continue")
Using has-text and quotes - matches the text using equals.
If you are using another selector for example text=Continue, this will match all elements that CONTAINS the text "Continue"
All this is explained with example in the official documentation for playwright selectors.
That does not mean to not use XPATH to achieve the goals.
CSS selectors are fast but kind of restricted to work with text.
Xpath is quite slower but much more powerful to work in text/parent/child elements etc.
I would suggest always to use an parent element with unique identifier and go down to reach your actual element, which will receive the interaction.
The fact that I love Playwright is because of scenarios like this and how easily it can be handled.
If you have a string named abc and there are multiple occurrences of that string on a single page, then you can use the nth-match criteria to pick the nth element.
For eg ,
await page.locator(':nth-match(:text("abc"), 3)').click();
will select the 3rd occurrence of the word abc. Similarly, in your case, if you want to select the first or second or third, you can simply do
await page.locator(':nth-match(:text("Continue"), 1)').click();
await page.locator(':nth-match(:text("Continue"), 2)').click();
await page.locator(':nth-match(:text("Continue"), 3)').click();
Please refer to the Selectors documentation for Playwright -> Selectors
This is different than the nth-child concept as mentioned
Unlike :nth-child(), elements do not have to be siblings, they could
be anywhere on the page. In the snippet above, all three buttons match
:text("Buy") selector, and :nth-match() selects the third button.
I have a database filled with keywords and I have to get the xpaths of the web elements containing these words. (there is an expandable list button next to every word. to click on that I need to get the xpath of the keyword and modify it to get the button XPath).
I can get the selenium web element using,
keyword_element = driver.find_element_by_xpath(f"//*[contains(text(), '{keyword}')]")
this only gets the selenium web element for the element containing the keyword like this,
<selenium.webdriver.remote.webelement.WebElement (session="1df8dbbae8c93fd772a5134ed0666a7a", element="b3d2bc8d-0c8c-4fb0-b3d1-2a5d49537567")>
and the HTML dev element looks like this.
<span class="fancytree-node actlitetreeitem fancytree-exp-n fancytree-ico-c">
<span class="fancytree-expander"></span>
<span class="fancytree-checkbox"></span>
<span class="fancytree-title">SNAPCHAT</span>
</span>
'SNAPCHAT' is the keyword and I need access to the 'fancytree-checkbox' element. but I have to get it using the keyword SNAPCHAT.
Is there any way to get the xpath of the web element out of this?
Use either of the xpath to identify the element fancytree-checkbox.
Use preceding-sibling
keyword_element = driver.find_element_by_xpath(f"//*[contains(text(), '{keyword}')]/preceding-sibling::span[1]")
OR Identify the parent node and then its child
keyword_element = driver.find_element_by_xpath(f"//span[.//*[contains(text(), '{keyword}')]]//span[#class='fancytree-checkbox']")
I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. Here are some examples:
<a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)">
65J22 (35R30 47A52 65J20 65R30 90C30) </a>
In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)"
<a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)">
05C80 (05C15) </a>
In this example I would want to extract "05C80 (05C15)". My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach.
My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation:
Driver.Instance.FindElement(By.XPath("//a[contains(#href, 'long')]"))
To port this over to python, the only analogous method I could think of would be to use the in operator, but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. How would I bring all of these ideas together to obtain my desired text?
driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in #href]").text
If I right understand you want to locate all elements, that have same partial href. You can use this:
elements = driver.find_elements_by_xpath("//a[contains(#href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
print(element.text)
or if you want to locate one element:
driver.find_element_by_xpath("//a[contains(#href, '/mathscinet/search/mscdoc.html')]").text
This will give a list of all elements located.
As per the HTML you have shared #AndreiSuvorkov's answer would possibly cater to your current requirement. Perhaps you can get much more granular and construct an optimized xpath by:
Instead of using contains using starts-with
Include the ?code= part of the #href attribute
Your effective code block will be:
all_elements = driver.find_elements_by_xpath("//a[starts-with(#href,'/mathscinet/search/mscdoc.html?code=')]")
for elem in all_elements:
print(elem.get_attribute("innerHTML"))
In the case that I want the first use of class so I don't have to guess the find_elements_by_xpath(), what are my options for this? The goal is to write less code, assuring any changes to the source I am scraping can be fixed easily. Is it possible to essentially
find_elements_by_css_selector('source[1]')
This code does not work as is though.
I am using selenium with Python and will likely be using phantomJS as the webdriver (Firefox for testing).
In CSS Selectors, square brackets select attributes, so your sample code is trying to select the 'source' type element with an attribute named 1, eg
<source 1="your_element" />
Whereas I gather you're trying to find the first in a list that looks like this:
<source>Blah</source>
<source>Rah</source>
If you just want the first matching element, you can use the singular form:
element = find_element_by_css_selector("source")
The form you were using returns a list, so you're also able to get the n-1th element to find the nth instance on the page (Lists index from 0):
element = find_elements_by_css_selector("source")[0]
Finally, if you want your CSS selectors to be completely explicit in which element they're finding, you can use the nth-of-type selector:
element = find_element_by_css_selector("source:nth-of-type(1)")
You might find some other helpful information at this blog post from Sauce Labs to help you write flexible selectors to replace your XPath.