I want to know how I can collect line, mailto link using selenium python the emails contains # sign in the contact page I tried the following code but it is somewhere works and somewhere not..
//*[contains(text(),"#")]
the emails formats are different somewhere it is <p>Email: name#domain.com</p> or <span>Email: name#domain.com</span> or name#domain.com
is there anyway to collect them with one statement..
Thanks
Here is the XPath you are looking for my friend.
//*[contains(text(),"#")]|//*[contains(#href,"#")]
You could create a collection of the link text values that contain # on the page and then iterate through to format. You are going to have to format the span like that has Email: name#domain.com anyway.
Use find_elements_by_partial_link_text to make the collection.
I think you need 2 XPath. First XPath for finding element that contains text "Email:", second XPath for element that contains attribute "mailto:".
//*[contains(text(),"Email:")]|//*[contains(#href,"mailto:")]
Related
a In a HTML page there is this line:
<td data-sort="funny" class="coin-name tw-text-right" style="min-width: 60px;">
and I can find it by using this XPATH:
//tbody/tr/td[5]
But I only interesting to put in a variable the "funny". Keep in mind that the word "funny" is changing all the time so I need to find it and push it to variable but how do I extract this changing text?
Thank you for helping ;-)
I am not sure if it will work 100% but here is one potential solution:
If you open up that tag you will find out that the first child's second child(Refer to image in solution) has a unique id attribute.
Then, you can use that unique attribute and work your way up to the parent tag with "data-sort attribute" using Child-to-Parent Traversing using Xpath. [Refer to the image it basically explains the same approach written above][1]
[1]: https://i.stack.imgur.com/9Dc2k.png
3.Once you uniquely identify the td tag you can then use getAttribute() and store its value.
I wanted to grab a certain data using selenium, the data is located inside a tag with a similar class, so how do I grab it?
Those 2 are the data, but they are inside the same class.
i tried to use
driver.find_elements_by_class_name
But it doesn't work, is there a way to grab it? thanks
Use the following xpath "//*[#class='card-title']" and use the function driver.find_elements_by_xpath. In order to check the correctness of the xpath, inspect the page and with Control + F or Command + F put the xpath in the search bar so you will see if the xpath finds the elements you are looking for
Then if you want the text inside:
elements = driver.find_elements_by_xpath("//*[#class='card-title']")
data = [element.text for element in elements]
yes there is you can grab the first one like this:
driver.find_element_by_xpath("(//h3[#class='cart-title'])[1]").find_element_by_tag_name('b').text
and the second one like this
driver.find_element_by_xpath("(//h3[#class='cart-title'])[2]").find_element_by_tag_name('b').text
I have a problem while trying to access some values on the website during the process of web scraping the data. The problem is that the text I want to extract is in the class which contains several texts separated by tags (these body tags also have texts which are also important for me).
So firstly, I tried to look for the tag with the text I needed ('Category' in this case) and then extract the exact category from the text below this body tag assignment. I could use precise XPath but here it is not the case because other pages I need to web scrape contain a different amount of rows in this sidebar so the locations, as well as XPaths, are different.
The expected output is 'utility' - the category in the sidebar.
The website and the text I need to extract look like that (look right at the sidebar containing 'Category':
The element looks like that:
And the code I tried:
driver = webdriver.Safari()
driver.get('https://www.statsforsharks.com/entry/MC_Squares')
element = driver.find_elements_by_xpath("//b[contains(text(), 'Category')]/following-sibling")
for value in element:
print(value.text)
driver.close()
the link to the page with the data is https://www.statsforsharks.com/entry/MC_Squares.
Thank you!
You might be better off using regex here, as the whole text comes under the 'company-sidebar-body' class, where only some text is between b tags and some are not.
So, you can the text of the class first:
sidebartext = driver.find_element_by_class_name("company-sidebar-body").text
That will give you the following:
"EOY Proj Sales: $1,000,000\r\nSales Prev Year: $200,000\r\nCategory: Utility\r\nAsking Deal\r\nEquity: 10%\r\nAmount: $300,000\r\nValue: $3,000,000\r\nEquity Deal\r\nSharks: Kevin O'Leary\r\nEquity: 25%\r\nAmount: $300,000\r\nValue: $1,200,000\r\nBite: -$1,800,000"
You can then use regex to target the category:
import re
c = re.search("Category:\s\w+", sidebartext).group()
print(c)
c will result in 'Category: Utility' which you can then work with. This will also work if the value of the category ('Utility') is different on other pages.
There are easier ways when it's a MediaWiki website. You could, for instance, access the page data through the API with a JSON request and parse it with a much more limited DOM.
Any particular reason you want to scrape my website?
So I am trying to scrape a list of email addresses from my User Explorer page in Google Analytics.
which
I obtained the x-path via here
The item's X-path is //*[#id="ID-explorer-table-dataTable-key-0-0"]/div
But no matter how I do:
driver.find_elements_by_xpath(`//*[#id="ID-explorer-table-dataTable-key-0-0"]/div`)
or
driver.find_elements_by_xpath('//*[#id="ID-reportContainer"]')
or
driver.find_elements_by_id(r"ID-explorer-table-dataTable-key-0-0")
it returns an empty list.
Can anyone tell me where I have gone wrong?
I also tried using:
html = driver.page_source
but of course I couldnt find the list of the emails as well.
I am also thinking, if this doesnt work, whether there is a way I can automate control + a and copy all the text displayed into a string in Python and then usere.findall() to find the email addresses?
email = driver.find_element_by_xpath(//*[#id="ID-explorer-table-dataTable-key-0-0"]/div)
print("email", email.get_attribute("innerHTML"))
Thanks for the help of #Guy!
It was something related to iframe and this worked and detected which frame the item i need belong to:
iframelist=driver.find_elements_by_tag_name('iframe')
for i in range(len(iframelist)):
driver.switch_to.frame(iframelist[i])
if len(driver.find_elements_by_xpath('//*[#id="ID-explorer-table-dataTable-key-0-0"]/div'))!=0:
print('it is item {}'.format(i))
break
else:
driver.switch_to.default_content()
I'm trying to make a test for my site. Having troubles on some user form. The trick is, that the number of text fields in the form varies depending in user options (disabled ones are present in the code, but have a style <displayed: none;> tag), so I'm trying to find more flexible approach than locating every element one-by-one and filling the forms with try/except blocks.
I'm using an xpath locator
text_fields = driver.find_elements_by_xpath("//div[#class='form-line']/div[#class='form-inputs']/input[#type='text' and not(ancestor::div[#style='display: none;'])]")
The trouble is that firebug locates only needed elements, but when I use it my selenium script, printing the list of text_fields gives me all the elements, even without a <displayed: none;> tag
How can I get only visible elements?
PS Sorry for my bad English ^_^
You can get all the form elements the usual way, then iterate on the list and remove those elements that do not return true on is_displayed().
Try the contains() method:
text_fields = driver.find_elements_by_xpath(
"//div[#class='form-line']/div[#class='form-inputs']/input[#type='text' and
not(ancestor::div[contains(#style, 'display: none;')])]")
The important part is:
div[contains(#style, 'display: none;')]
Note, that if the style contains the string display:none; or display:none, the selector won't match.
I use the following and it works great.
self.assertTrue(driver.find_element_by_xpath("//div[#id='game_icons']/div/div[2]/div/a/img"))
This is for Selenium and Python of course.