Get the List of Multiple Element with Same ClassName - python

I'd like to crawl every case whose Panel Reoport has already composed from the WTO official page.
and
As you can check at the above image (or refer to
https://www.wto.org/english/tratop_e/dispu_e/dispu_status_e.htm,
Every case is indexed with "DS XXX" and right at the below it denotes whether the "Panel Composed" or still yet "in Consultation".
If I inspect, they all share the same
<p class = "panel-text-simple">
So I had tried following two commands:
elem_info = driver.find_element_by_class_name("panel-title-simple")
elem_info = driver.find_element_by_xpath("//p[#class='panel-title-simple']");
but every one of them only gives me the top most case, the most recent one.
I have to locate every case's info, then should make a for-loop to check whether the panel composed or not.
How could I do that?

Use find_elements (note the 's'). This returns a list that you can then loop through:
documents = driver.find_elements_by_class_name("panel-title-simple");
for document in documents
# continue with your code

You can use the XPath below to get all the LIs that have a current status of 'Panel composed'
//li[.//p[contains(.,'Panel composed')]]
From there you can get the DS number
.//small
or the details
./p
and so on.

Related

Extracting element IDs of required forms in selenium/python

I'm crawling this page (https://boards.greenhouse.io/reddit/jobs/4330383) using Selenium in Python and am looping through all of the required fields using:
required = driver.find_elements_by_css_selector("[aria-required=true]").
The problem is that I can't view each element's id. The command required[0].id (which is the same as driver.find_element_by_id("first_name").id returns a long string of alphanumeric characters and hyphens – even though the id is first_name in the HTML. Can someone explain to be why the id is being changed from first_name to this string? And how can I view the actual id that I'm expecting?
Additionally, what would be the simplest way to reference the associated label mentioned right before it in the HTML (i.e. "First Name " in this case)? The goal is to loop through the required list and be able to tell what each of these forms is actually asking the user for.
Thanks in advance! Any advice or alternatives are welcome, as well.
Your code is almost good. All you need to do is use the .get_attribute() method to get your id's:
required = driver.find_elements_by_css_selector("[aria-required=true]")
for r in required:
print(r.get_attribute("id"))
driver.find_element_by_id("first_name") returns a web element object.
In order to get a web element attribute value like href or id - get_attribute() method should be applied on the web element object.
So, you need to change your code to be
driver.find_element_by_id("first_name").get_attribute("id")
This will give you the id attribute value of that element
I'm going to answer my second question ("How to reference the element's associated label?") since I just figured it out using the find_element_by_xpath() method in conjunction with the .get_attribute("id") solutions mentioned in the previous answers:
ele_id = driver.find_element_by_id("first_name").get_attribute("id")
label_text = driver.find_element_by_xpath('//label[#for="{}"]'.format(ele_id)).text

Python Selenium only getting first row when iterating over table

I am trying to extract the most recent headlines from the following news site:
http://news.sina.com.cn/hotnews/
#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']
#save ids of relevant subsections
con_ids = ['Con11']
#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
button = driver.find_element_by_id(button_id)
ActionChains(driver).move_to_element(button).perform()
Then I iterate through each section that I am interested in and within each section through all the headlines which are rows in an HTML table. However, on every iteration, it returns the first element
for con_id in con_ids:
for news_id in range(2,10):
print(news_id)
headline = driver.find_element_by_xpath("//div[#id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
text = headline.find_element_by_xpath("//td[2]/a")
print(text.get_attribute("innerText"))
print(text.get_attribute("href"))
com_no = comment.find_element_by_xpath("//td[3]/a")
print(com_no.get_attribute("innerText"))
I also tried the following approach by essentially saving the table as a list and then iterating through the rows:
for con_id in con_ids:
table = driver.find_elements_by_xpath("//div[#id='"+con_id+"']/table/tbody/tr")
for headline in table:
text = headline.find_element_by_xpath("//td[2]/a")
print(text.get_attribute("innerText"))
print(text.get_attribute("href"))
com_no = comment.find_element_by_xpath("//td[3]/a")
print(com_no.get_attribute("innerText"))
In the second case I get exactly the number of headlines in the section, so it apparently correctly picks up the number of rows. However, it is still only returning the first row on all iterations. Where am I going wrong? I know a similar question has been asked here: Selenium Python iterate over a table of rows it is stopping at the first row but I am still unable to figure out where I am going wrong.
In XPath, queries that begin with // will search relative to the document root; so even though you're calling find_element_by_xpath() on the correct container element, you're breaking out of that scope, thereby performing the same global search and yielding the same result every time.
To constrain your query to descendants of the current element, begin your query with .//, e.g.,:
text = headline.find_element_by_xpath(".//td[2]/a")
try this:
for con_id in con_ids:
for news_id in range(2,10):
print(news_id)
print("(//div[#id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
headline = driver.find_element_by_xpath("(//div[#id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
value = headline.find_element_by_xpath(".//td[2]/a")
print(value.get_attribute("innerText").encode('utf-8'))
I am able to get the headlines with above code
I was able to solve it by specifying the entire XPath in one go like this:
headline = driver.find_element_by_xpath("(//*[#id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]/td[2]/a)")
print(headline.get_attribute("innerText"))
print(headline.get_attribute("href"))
rather than splitting it into two parts.
My only explanation for why it only prints the first row repeatedly is that there is some weird Javascript at work that doesn't let you iterate properly when splitting the request.
Or my first version had a syntax error, which I am not aware of.
If anyone has a better explanation, I'd be glad to hear it!

How to get all matching elements without scrolling using robot framework and python?

I need to get ALL elements matching an xpath to list without need of scrolling to all the elements, because I can never know whether scrolling is really needed (depends on monitor resolution and the width of columns can be also changed by devs) and I would not know where to scroll, because I am checking the existence of the columns and in some cases the correct result is also that the column is not there...
I use these functions in the test to get the list of column headers:
Get all columns of table ${table}
${columns_loc}= replace substring placeholder ${GRID_HEADERS} ${table}
#{locators}= Get Webelements ${columns_loc}
${result}= Create List
:FOR ${locator} in #{locators}
\ ${name}= Get Text ${locator}
\ Append To List ${result} ${name}
[Return] ${result}
Check column ${column} is visible
${headers2}= Get all values
${res}= value is in list ${headers2} ${column}
run keyword if ${res}==False fail wrong columns displayed
... else pass execution
I also tried to use this:
def dict_elements(self, xpath):
elements = self.get_library_instance()._element_find(xpath, False, True)
headers_map = {}
if elements is not None:
for index, element in enumerate(elements):
headers_map[element.text] = index
return headers_map
return None
def list_elements(self, xpath):
dict = self.dict_elements(xpath)
return dictionary_keys_to_list(dict)
Get all columns of table ${table}
${columns_loc}= replace substring placeholder ${GRID_HEADERS} ${table}
#{cols}= list elements ${columns_loc}
But both variants return 9 columns - which is correct, there is really 9 columns, but the last one is EMPTY (returned just u'') instead of actual text of the element (like u'last column name'). This last column header is not visible directly, it must be horizontally scrolled to get to it...
The table is dynamically generated by angular.
Thanks for any advice!
You can use Get Matching Xpath Count Refer to the Get Matching Xpath Count keyword in the documentation for more info.
If the app really needs an element to be visualized in the browser's viewport to fill in it's textual value, there's a "trick" working for me to do just that - to use the Selenium2Library keyword Mouse Over.
Calling it physically scrolls the page in the browser so the element is visible; I've tried the other usual solutions - calculating the absolute/relative positions of the element and calling js scrollTo(), etc., but they were unreliable - working on some elements, not for others, different browser behavior, and so in. Mouse Over surprisingly (for me at least :) just works.
So in your code sample:
:FOR ${locator} in #{locators}
\ Mouse Over ${locator}
\ ${name}= Get Text ${locator}
\ Append To List ${result} ${name}
Hope this'll help.
By the way, your question is very context specific, in general please include/link some screenshots, html source, the source of relevant keywords (value is in list - is this the same as the built-in Should Contain?) - so we can understand the issue better. My 2 ç. ;)

Scrapy SgmlLinkExtractor how to scrape li tags with changing id's

How can I get an element at this specific location:
Check picture
The XPath is:
//*[#id="id316"]/span[2]
I got this path from google chrome browser. I basically want to retreive the number at this specific location with the following statement:
zimmer = response.xpath('//*[#id="id316"]/span[2]').extract()
However I'm not getting anything but an empty string. I found out that the id value is different for each element in the list I'm interested in. Is there a way to write this expression such that it works for generic numbers?
Use the corresponding label and get the following sibling element containing the value:
//span[. = 'Zimmer']/following-sibling::span/text()
And, note the bonus to the readability of the locator.

Scan through row elements whose classname is identical

I have a number of links in rows, in a web page, whose class-names are the same. Like this:
I am able to click the first link occurrence using XPATH,
"(//span[#class='odds black'])"
However, I want to scan through the particular row and click on each odds (if it is present).
Any help on how to achieve this ?
Note: I cannot find the element using other attributes, as it will change dynamically as per the data.
Image of reference source code:
Instead of using the XPATH in this format:
"(//span[#class='odds black'])"
could you use it in this format shown just above your red box:
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[1]/td[5]/a/span[2]/span/span
(you can get this format easily by selecting an element in firebug, right clicking it's code and selecting copy XPATH).
I have found in many instances I can add a counter for a tr[1] or some other path attribute in order to move down rows quite accurately. I can't really see your site to compare the xpath below but I imagine it would be something like:
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[1]/td[5]/a/span[2]/span/span
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[2]/td[5]/a/span[2]/span/span
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[3]/td[5]/a/span[2]/span/span
then you can add a counter like "i"
so you would iterate the counter in the loop and set it to something along the lines of:
"/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr["+str(i)+"]/td[5]/a/span[2]/span/span"
Assuming that class name will be always 'odds some color' you can use xpath's contains() function. Xpath like this:
"//span[contains(#class,'odds')]"
will return all spans that contain string 'odds' in classname.
CSS selectors are class aware so it would make more sense to me to use;
span.odds
Xpath treats class as a simple string so forces you to use "contains" where as CSS allows you to treat classes separately

Categories