XPath - Select all <p> elements does not work - python

I have some basic selenium code and an xpath expression that performs well.
The xpath:
/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr//td/div[5]/table/tbody/tr[2]
selects the section I'm interested in, containing many elements.
however, append '//p' like so:
/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr//td/div[5]/table/tbody/tr[2]//p
does NOT select only those elements. Instead, what I ended up with is a single element.
I'm obviously missing something basic. This is an example of what my code looks like:
#!/usr/bin/env python
from selenium import webdriver
from time import sleep
fp = webdriver.FirefoxProfile()
wd = webdriver.Firefox(firefox_profile=fp)
wd.get("http://someurl.html")
# appending //p here is the problem that finds only a single <a> element
elems = wd.find_element_by_xpath("/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr/td/div[5]/table/tbody/tr[2]//p")
print elems.get_attribute("innerHTML").encode("utf-8", 'ignore')
wd.close()
EDIT: solved by using find_element*s*_by_xpath instead of find_element as suggested (thanks, Alexander Petrovich, for spotting this).

Don't use such locators. Shorten them a bit. Something like //table[#attr='value']/tbody/tr[2]//p
To select multiple elements, use find_elements_by_xpath() method (it returns a list of WebElement objects)
You will not be able to use elems.get_attribute(). Instead, you'll have to iterate through the list
elems = wd.find_elements_by_xpath("/your/xpath")
for el in elems:
print '\n' + el.get_attribute('innerHTML').encode("utf-8", 'ignore')

Related

Selenium - to make find_elements. readable

Basic concept I know:
find_element = find single elements. We can use .text or get.attribute('href') to make the element can be readable. Since find_elements is a list, we can't use .textor get.attribute('href') otherwise it shows no attribute.
To scrape information to be readable from find_elements, we can use for loop function:
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
for i in vegetables_search:
print(i.text)
Here is my problem, when I use find_element, it shows the same result. I searched the problem on the internet and the answer said that it's because using find_element would just show a single result only. Here is my code which hopes to grab different urls.
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
But I don't know how to combine the results into pandas. If I print these codes, links variable prints the same url on the csv file...
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
Product_name =[]
links = []
for search in vegetables_search:
Product_name.append(search.find_element(By.TAG_NAME, "h4").text)
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
#use panda modules to export the information
df = pd.DataFrame({'Product': Product_name,'Link': links})
df.to_csv('name.csv', index=False)
print(df)
Certainly, if I use loop function particularly, it shows different links.(That's mean my Xpath is correct(!?))
product_link = (driver.find_elements(By.XPATH, "//a[#rel='noopener']"))
for i in product_link:
print(i.get_attribute('href'))
My questions:
Besides using for loop function, how to make find_elements becomes readable? Just like find_element(By.attribute, 'content').text
How to go further step for my code? I cannot print out different urls.
Thanks so much. ORZ
This is the html code which's inspected from the website:
This line:
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
should be changed to be
links.append(search.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href') will always search for the first element on the DOM matching .//a[#rel='noopener'] XPath locator while you want to find the match inside another element.
To do so you need to change WebDriver driver object with WebElement search object you want to search inside, as shown above.

Selenium Python - Store XPath in var and extract depther hirachy XPath from var

I sadly couldn't find any resources online for my problem. I'm trying to store elements found by XPath in a list and then loop over the XPath elements in a list to search in that object. But instead of searching in that given object, it seems that selenium is always again looking in the whole site.
Anyone with good knowledge about this? I've seen that:
// Selects nodes in the document from the current node that matches the selection no matter where they are
But I've also tried "/" and it didn't work either.
Instead of giving me the text for each div, it gives me the text from all divs.
My Code:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# I'm looking for all divs with a specific class and store them in a list
divs_found = driver.find_elements_by_xpath("//div[#class='a-fixed-right-grid-col a-col-left']")
# Here seems to be the problem as it seems like instead of "divs_found[1]" it behaves like "driver" an looking on the whole site
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
# Now I'm looking in the found href matches to store the text from it
for href in hrefs_matching_in_div:
result_text.append(href.text)
print(result_text)
You need to add . for immediate child.Try now.
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath(".//a[contains(#href, '/gp/product/')]")

Selenium Python: How can I count the number of tables in a div?

I'm trying to find an element with Xpath but it changes like so:
//*[#id="emailwrapper"]/div/div/table[1]/tbody/tr/td[2]/a
//*[#id="emailwrapper"]/div/div/table[2]/tbody/tr/td[2]/a
//*[#id="emailwrapper"]/div/div/table[3]/tbody/tr/td[2]/a
//*[#id="emailwrapper"]/div/div/table[4]/tbody/tr/td[2]/a
//*[#id="emailwrapper"]/div/div/table[5]/tbody/tr/td[2]/a
//*[#id="emailwrapper"]/div/div/table[6]/tbody/tr/td[2]/a
My current assumption is that the table I'm looking for will always be that last one in the table array but I want to confirm this by counting the number of tables in the second div. Does anyone know how to do this?
Simple solution is using the below xpath.
//*[#id='emailwrapper']/div/div/table
Your code should be
lastTable = len(driver.find_elements_by_xpath("//*[#id='emailwrapper']/div/div/table"))-1
print lastTable
Assuming that there would be at least one element that matches the xpath of '//*[#id="emailwrapper"]/div/div/table', you can simply do:
driver.find_elements_by_xpath('//*[#id="emailwrapper"]/div/div/table')
It will return a list, or raise NoSuchElementException if none are found.
Exact same results but written differently:
from selenium.webdriver.common.by import By
driver.find_elements(By.XPATH, '//*[#id="emailwrapper"]/div/div/table')
After which you can do a len() on the list for how many elements

Python: Remove repeated code for repeated action in Selenium

I am working in Python with Selenium and have to do this:
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
I am doing this because there are 4 elements on the page that have this CSS Selector "#results .page_block_sub_header_count". And I want to get the result of the 4th one.
Is there a good way I could put this into code? I do not want 4 similar lines. And I believe this is not considered a good code practice.
Just use the find_elements_by_css_selector() (note the "s") to locate multiple elements matching a locator - CSS selector in your case:
results = driver.find_elements_by_css_selector('#results .page_block_sub_header_count')
results would be a list of WebElement instances, you can iterate over it and get, say, a text:
for item in results:
print(item.text)
Without the HTML it is tough to provide you the best fit solution. How ever as I can see from your tried out code, you have used :
driver.find_element_by_css_selector('#results .page_block_sub_header_count')
This essentially means, the node with id as results has atleast 4 childnodes with class as page_block_sub_header_count. So to construct the best fit css_selector we are missing the information regarding the tagName which contains the class page_block_sub_header_count which will be available in the HTML DOM.
Still if you want to get the result of the 4th one you can use the following line of code :
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count:nth-of-type(4)')
Also you can use the following code (just a case):
elements = driver.find_elements_by_css_selector("#results .page_block_sub_header_count")
for index in range(len(elements)):
elements[index].text
PS: #alecxe's code is better for usage.

Indexing over the results returned by selenium

I try to index over results returned by an xpath. For example:
xpath = '//a[#id="someID"]'
can return a few results. I want to get a list of them. I thought that doing:
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text('(%s)[%d]'%(xpath, i)))
would work because doing something similar with firefox's Xpath checker works:
(//a[#id='someID'])[2]
returns the 2nd result.
Ideas why the behavior would be different and how to do such a thing with selenium
Thanks
Can you try the xpath /html/descendant::a[#id="someID"] You can replace the /html with something else that is an ancestor of your links like id('content'). You should then be able to locate individual links using [1], [2] etc.
From the XPath TR at http://www.w3.org/TR/xpath#path-abbrev:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
The answer is that you need to tell selenium that you're using xpath:
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text(xpath='(%s)[%d]'%(xpath, i)))
In Selenium you normally do it without the extra brackets so your loop would look like the following
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text('%s[%d]'%(xpath, i)))
And that will produce a valid XPath in Selenium like //a[#id='someID'][2]

Categories