I'm crawling this page (https://boards.greenhouse.io/reddit/jobs/4330383) using Selenium in Python and am looping through all of the required fields using:
required = driver.find_elements_by_css_selector("[aria-required=true]").
The problem is that I can't view each element's id. The command required[0].id (which is the same as driver.find_element_by_id("first_name").id returns a long string of alphanumeric characters and hyphens – even though the id is first_name in the HTML. Can someone explain to be why the id is being changed from first_name to this string? And how can I view the actual id that I'm expecting?
Additionally, what would be the simplest way to reference the associated label mentioned right before it in the HTML (i.e. "First Name " in this case)? The goal is to loop through the required list and be able to tell what each of these forms is actually asking the user for.
Thanks in advance! Any advice or alternatives are welcome, as well.
Your code is almost good. All you need to do is use the .get_attribute() method to get your id's:
required = driver.find_elements_by_css_selector("[aria-required=true]")
for r in required:
print(r.get_attribute("id"))
driver.find_element_by_id("first_name") returns a web element object.
In order to get a web element attribute value like href or id - get_attribute() method should be applied on the web element object.
So, you need to change your code to be
driver.find_element_by_id("first_name").get_attribute("id")
This will give you the id attribute value of that element
I'm going to answer my second question ("How to reference the element's associated label?") since I just figured it out using the find_element_by_xpath() method in conjunction with the .get_attribute("id") solutions mentioned in the previous answers:
ele_id = driver.find_element_by_id("first_name").get_attribute("id")
label_text = driver.find_element_by_xpath('//label[#for="{}"]'.format(ele_id)).text
Related
I am very new to this and i have tried to look for the answer to this but unable to find any.
I am using Selenium+chromedriver, trying to monitor some items I am interested in.
Example:
a page with 20 items in a list.
Code:
#list of items on the page
search_area = driver.find_elements_by_xpath("//li[#data-testid='test']")
search_area[19].find_element_by_xpath("//p[#class='sc-hKwDye name']").text
this returns the name of item[0]
search_area[19].find_element_by_css_selector('.name').text
this returns the name of item[19]
why is xpath looking at the parent html?
I want xpath to return the name of item within the WebElement /list item. is it possible?
found the answer, add a . in front
hope this is gonna help someone new like me in the future.
from
search_area[19].find_element_by_xpath("//p[#class='sc-hKwDye name']").text
to
search_area[19].find_element_by_xpath(".//p[#class='sc-hKwDye name']").text
What you are passing in find_element_by_xpath("//p[#class='sc-hKwDye name']") is relative Xpath. You can pass the full Xpath to get the desired result.
I'm trying to get string:
Liquidity (Including Fees)
from line
<div class="sc-bdVaJa KpMoH css-1ecm0so">Liquidity (Including Fees)</div>
I've tried these below
none of them gave me the string that I want:
usdbaslik = driver.find_element_by_css_selector("[class='sc-bdVaJa KpMoH css-1ecm0so']")
print(usdbaslik.text,":---text")
print(usdbaslik.tag_name,":---tag_name")
print(usdbaslik.id,":---id")
print(usdbaslik.size,":---size")
print(usdbaslik.rect,":---rect")
print(usdbaslik.location,":---location")
print(usdbaslik.location_once_scrolled_into_view,":---location_once_scrolled_into_view")
print(usdbaslik.parent,":---parent")
print(usdbaslik.screenshot_as_png,":--screenshot_as_png")
print(usdbaslik.screenshot_as_base64,":--screenshot_as_base64")
print(usdbaslik.__class__,":--__class__")
What am I doing wrong?
Thanks in advance.
There is (at least) one other element with that class on the page, so it's not a unique selector. The closest thing I was able to find to a unique selector looking at the page would be
usdbaslik = driver.find_elements_by_xpath('//div[#class="sc-VigVT fKQdIL"]//div[#class="sc-bdVaJa KpMoH css-1ecm0so"]')[0])
Then you can get the text from the label with
print(usdbaslik.get_attribute('innerText'))
I have created a script that parses through emails on a weekly basis looking for tables within specific emails. I know that I want things that are within a table tag with a specific class name. The goal then is to take those tables, essentially concat them with a tag in between, and put into another email to automatically send every week.
What I have so far is the actual email scraping, the sending of the email at the end, but I just don't know how to combine the results of a find_all into one element. I'm obviously open to different approaches, which is why I posed the question as thus.
What I have for code is this:
def parse_messages(enhance_str):
soup = BeautifulSoup(enhance_str, 'html.parser')
table = soup.find_all('table', {'class': 'MsoNormalTable'})
return table
which gives me a list-like object (I know find_all sub classes list), but any list methods I know don't work with this object. I thought I could just do something like
'<br/>'.join(table)
but this throws an attribute error.
I'm sure there is a simple answer, but I can't see it. Any help is greatly appreciated.
EDIT: As a clarifcation, I was just trying to preserve the html structure of these tables so I can just pop them into a new email and send them as is. The solution below works for me, so I'm marking it as the accepted answer.
Thanks for the help!
The elements in the output list of soup.find_all are bs4.element.Tag objects, not some objects you can join together as-is to make a string.
I'm not sure what you're upto but if you want to make them all a single str, you can iterate over the Tags, call str on them to get the string representation and then join:
'<br/>'.join([str(tag) for tag in table])
I'd like to crawl every case whose Panel Reoport has already composed from the WTO official page.
and
As you can check at the above image (or refer to
https://www.wto.org/english/tratop_e/dispu_e/dispu_status_e.htm,
Every case is indexed with "DS XXX" and right at the below it denotes whether the "Panel Composed" or still yet "in Consultation".
If I inspect, they all share the same
<p class = "panel-text-simple">
So I had tried following two commands:
elem_info = driver.find_element_by_class_name("panel-title-simple")
elem_info = driver.find_element_by_xpath("//p[#class='panel-title-simple']");
but every one of them only gives me the top most case, the most recent one.
I have to locate every case's info, then should make a for-loop to check whether the panel composed or not.
How could I do that?
Use find_elements (note the 's'). This returns a list that you can then loop through:
documents = driver.find_elements_by_class_name("panel-title-simple");
for document in documents
# continue with your code
You can use the XPath below to get all the LIs that have a current status of 'Panel composed'
//li[.//p[contains(.,'Panel composed')]]
From there you can get the DS number
.//small
or the details
./p
and so on.
I want to create function that will allow to fill registration, authorization and other text forms. Something like:
def fill_form(my_list):
forms = driver.find_elements_by_xpath(common_xpath)
for element in forms:
element.send_keys(my_list[forms.index(element)])
It should get as arguments a list of text values to send into <input type="text">, <input type="password"> and <textarea> elements.
So far I have:
common_xpath ='//input[#type="text" or #type="password"]'
but I can't understand how to add textarea element to this XPath to match it also
A simpler and more future-proof strategy would be separate the XPath expression into 2/3 distinct expressions that search for required WebElements, but if you really need to use a single XPath expression I imagine you could use the | operator to devise something along the lines of:
//input[#type="text" or #type="password"] | //textarea
Not sure if this is going to do what you look for, but, from what I can see this Xpath could get the job done:
common_xpath = "//*[self::input[#type='text'] or self::input[#type='password'] or self::textarea]"
I don't program in Python, but tried that one in Chrome's console (using $x("...")), and it seems to do what you want. You should consider calling that XPath inside the form (path/to/your/form//*...), to make it more specific.
TIL that you could select different tags in Xpath :)
Check this related answer for more info.
Finally, as a personal note, I'm not that experience with Selenium, but I would suggest you to consider using the PageObject model design pattern, to make the tests easier to maintain. Hope it works for you.
I am not aware of Python, but this is what I would do in a JAVa
String [] commanXpath= {"text", "password"};
String xpathVar= "//input[#type='"+commanXpath[index]+"']";
System.out.println(xpathVar);
By common_xpath= By.xpath(xpathVar);
See if you can implement similar logic in Python. Can you also update original post with exact html tags.