a In a HTML page there is this line:
<td data-sort="funny" class="coin-name tw-text-right" style="min-width: 60px;">
and I can find it by using this XPATH:
//tbody/tr/td[5]
But I only interesting to put in a variable the "funny". Keep in mind that the word "funny" is changing all the time so I need to find it and push it to variable but how do I extract this changing text?
Thank you for helping ;-)
I am not sure if it will work 100% but here is one potential solution:
If you open up that tag you will find out that the first child's second child(Refer to image in solution) has a unique id attribute.
Then, you can use that unique attribute and work your way up to the parent tag with "data-sort attribute" using Child-to-Parent Traversing using Xpath. [Refer to the image it basically explains the same approach written above][1]
[1]: https://i.stack.imgur.com/9Dc2k.png
3.Once you uniquely identify the td tag you can then use getAttribute() and store its value.
Related
I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. Here are some examples:
<a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)">
65J22 (35R30 47A52 65J20 65R30 90C30) </a>
In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)"
<a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)">
05C80 (05C15) </a>
In this example I would want to extract "05C80 (05C15)". My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach.
My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation:
Driver.Instance.FindElement(By.XPath("//a[contains(#href, 'long')]"))
To port this over to python, the only analogous method I could think of would be to use the in operator, but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. How would I bring all of these ideas together to obtain my desired text?
driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in #href]").text
If I right understand you want to locate all elements, that have same partial href. You can use this:
elements = driver.find_elements_by_xpath("//a[contains(#href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
print(element.text)
or if you want to locate one element:
driver.find_element_by_xpath("//a[contains(#href, '/mathscinet/search/mscdoc.html')]").text
This will give a list of all elements located.
As per the HTML you have shared #AndreiSuvorkov's answer would possibly cater to your current requirement. Perhaps you can get much more granular and construct an optimized xpath by:
Instead of using contains using starts-with
Include the ?code= part of the #href attribute
Your effective code block will be:
all_elements = driver.find_elements_by_xpath("//a[starts-with(#href,'/mathscinet/search/mscdoc.html?code=')]")
for elem in all_elements:
print(elem.get_attribute("innerHTML"))
I am trying to find the input type with statusid_103408 and with text() Draft
here is the xpath i am using, not sure where I am going wrong
//input[#name='statusid_103408' and contains(text(), 'Draft')]
The reason this xpath does not work is because the text of "Draft" is not actually a property of the input element. It is contained in the li element that is the parent. Therefore, your search is returning no results.
I suggest just using the name only in your xpath search (if it unique). If you definitely need the text in your search, you can search the li item's text first, then find your input, like so:
//li[text()='Draft']/input[#name='statusid_103408']
Use Value it will work , because value is unique, text is not inside the input tag!
I am new to Python and BeautifulSoup. So please forgive me if I'm using the wrong terminology.
I am trying to get a specific 'text' from a div tag/element that has multiple attributes in the same .
<div class="property-item" data-id="183" data-name="Brittany Apartments" data-street_number="240" data-street_name="Brittany Drive" data-city="Ottawa" data-province="Ontario" data-postal="K1K 0R7" data-country="Canada" data-phone="613-688-2222" data-path="/apartments-for-rent/brittany-apartments-240-brittany-drive-ottawa/" data-type="High-rise-apartment" data-latitude="45.4461070" data-longitude="-75.6465360" >
Below is my code to loop through and find 'property-item'
for btnMoreDetails in citySoup.findAll(attrs= {"class":"property-item"}):
My question is, if I specifically want the 'data-name' and 'data-path' for example, how do I go about getting it?
I've searched google and even this website. Some were saying using the .contents[2]. But I still wasn't able to get any of it.
Once you have extracted the element (which findAll does one at a time) you can access attributes as though they were dictionary keys. So for example the following code:
data = """<div class="property-item" data-id="183" data-name="Brittany Apartments" data-street_number="240" data-street_name="Brittany Drive" data-city="Ottawa" data-province="Ontario" data-postal="K1K 0R7" data-country="Canada" data-phone="613-688-2222" data-path="/apartments-for-rent/brittany-apartments-240-brittany-drive-ottawa/" data-type="High-rise-apartment" data-latitude="45.4461070" data-longitude="-75.6465360" >"""
import bs4
soup = bs4.BeautifulSoup(data)
for btnMoreDetails in soup.findAll(attrs= {"class":"property-item"}):
print btnMoreDetails["data-name"]
prints out
Brittany Apartments
If you want to get the data-name and data-path attributes, you can simply use the dictionary-like access to Tag's attributes:
for btnMoreDetails in citySoup.findAll(attrs={"class":"property-item"}):
print(btnMoreDetails["data-name"])
print(btnMoreDetails["data-path"])
Note that you can also use the CSS selector to match the property items:
for property_item in citySoup.select(".property-item"):
print(property_item["data-name"])
print(property_item["data-path"])
FYI, if you want to see all the attributes use .attrs property:
for property_item in citySoup.select(".property-item"):
print(property_item.attrs)
I'm trying to click the following link using selenium.
<div id="RECORD_2" class="search-results-item">
<a hasautosubmit="true" oncontextmenu="javascript:return IsAllowedRightClick(this);" class="smallV110" href="#;cacheurlFromRightClick=no"></a>
</div>
Which record to click is not known before the code is executed. Record_2 has multiple children, and the one included is the one I want to click. The link is edited for the sake of privacy. I tried to do something like that where name is the record variable, however it doesn't work.
driver.find_element_by_css_selector("css=div#"RECORD_%s" % (name).smallV110")
I'm a complete newbie to selenium so I couldn't figure out a way to sort this out. I would appreciate any help. Thanks!
Note that this is not Selenium IDE and you don't need the css= at the beginning of a selector.
There are multiple ways to locate the link element, e.g.:
driver.find_element_by_css_selector(".search-results-item a.smallV110")
driver.find_element_by_css_selector("[id^=RECORD] a.smallV110") # id starts with "RECORD"
If you know the id value beforehand:
id_i_know = 2
driver.find_element_by_css_selector("[id=RECORD_%d] a.smallV110" % id_i_know)
You don't have to have that smallV110 class attribute check - I've added it to increase chances of not matching other a elements inside the div (not sure what they are, you have not posted the entire HTML).
I want to know how I can collect line, mailto link using selenium python the emails contains # sign in the contact page I tried the following code but it is somewhere works and somewhere not..
//*[contains(text(),"#")]
the emails formats are different somewhere it is <p>Email: name#domain.com</p> or <span>Email: name#domain.com</span> or name#domain.com
is there anyway to collect them with one statement..
Thanks
Here is the XPath you are looking for my friend.
//*[contains(text(),"#")]|//*[contains(#href,"#")]
You could create a collection of the link text values that contain # on the page and then iterate through to format. You are going to have to format the span like that has Email: name#domain.com anyway.
Use find_elements_by_partial_link_text to make the collection.
I think you need 2 XPath. First XPath for finding element that contains text "Email:", second XPath for element that contains attribute "mailto:".
//*[contains(text(),"Email:")]|//*[contains(#href,"mailto:")]