Im doing some scrapping with selenium Python, my problem is that, when I call WebElement.text() it gives me a string in one line with no format. But I want to get that text just as the web shows, that is, with the line breaks.
For example, the element with text:
<br>'Hello this is an example'<br>
In the web it shows as:
<br>
'Hello this is an<br>
example'
I want the second result, but Selenium gives me the first one. I tried to 'manually' give format to the text using the width of the words with PIL, but the results are quite unexact.
Instead of using the text attribute, you need to use the get_attribute("innerHTML") as follows:
print(WebElement.get_attribute("innerHTML"))
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
Related
I am trying to scrape job postings and I'm having Selenium go to each individual posting to get some text. The problem is the structure of the page isn't the same for every posting so I'm trying to tell Selenium to grab the text from the element that immediately follows a div that contains the text Job Scope since that is where the text always is.
Here is one site, again it's not the same for every one but is always after Job Scope/Duties: https://recruitingbypaycor.com/career/JobIntroduction.action?clientId=8a87142e46c6fe710146f995773e6461&id=8a78839e812f7de70181456c3ad709ff&source=&lang=en
Here is my code:
job_descriptions = WebDriverWait(browser, 10).until(EC.presence_of_all_elements_located((By.ID,"gnewtonJobDescriptionText")))
for job in job_descriptions:
job_description.append(job.find_elements(By.XPATH, "//div[contains(text(),'Job Scope')]/following-sibling::div"))
I've got the script to work here but it produces empty list. So when I add .text to the end it errors our saying 'list' object has no attribute 'text' (obviously since it's producing an empty list. The only thing I can think of is that the text Job Scope is actually within a b tag inside the preceding div so maybe that's why?
Here, I want to scrape a website called "fundsnetservices.com." Specifically, I want to grab the text below each program — it's about a paragraph's worth of text.
Using the Google Chrome Inspect method, I was able to pull this...
'/html/body/div[3]/div/div/div[1]/div/p[2]/text()'
... as the xpath. However, every time I print the text out, it returns [ ]. Why might this be?
response = urllib.request.urlopen('http://www.fundsnetservices.com/searchresult/30/International-Grants-&-Funders/18.html')
tree = etree.HTML(response.read().decode('utf-16'))
text = tree.xpath('/html/body/div[3]/div/div/div[1]/div/p[2]/text()')
It seems your code returns whitespace nodes. Correct your XPath with :
//p[#class="tdclass"]/text()[3]
I am using the following code using Python 3.6 and selenium:
element = driver.find_element_by_class_name("first_result_price")
print(element)
on the website it is like this
`website: span class="first_result_price">712
however if I print element I get a completely different number?
Any suggestions?
many thanks!!
"element" is a type of object called WebElement that Selenium adds. If you want to find the text inside that element, you have to say
element.text
Which should return what you're looking for, '712', albeit in string form.
Hi I would like to scrap what is selected in the following image:
Image Code
I know i could use the following code to get the text:
cell = driver.find_elements_by_xpath(".//*[#id='ip_selection1233880116name']")
print cell.text
But my problem is that ip_selection1233880116name should be dynamic, given that it changes every time as you can see from the image.
How could I do it?
Thanks a lot for your help!!!!
Use contains to catch just the name presuming the numbers all all that change, for a single element you should also use find_element as opposed to find_elements :
find_element_by_xpath("//*[contains(#id,'ip_selection') and contains(#id,'name')]")
You could also use starts-with and ends-with depending on the browser:
find_element_by_xpath("//*[starts-with(#id,'ip_selection') and ends-with(#id,'name')]")
I want to know how I can collect line, mailto link using selenium python the emails contains # sign in the contact page I tried the following code but it is somewhere works and somewhere not..
//*[contains(text(),"#")]
the emails formats are different somewhere it is <p>Email: name#domain.com</p> or <span>Email: name#domain.com</span> or name#domain.com
is there anyway to collect them with one statement..
Thanks
Here is the XPath you are looking for my friend.
//*[contains(text(),"#")]|//*[contains(#href,"#")]
You could create a collection of the link text values that contain # on the page and then iterate through to format. You are going to have to format the span like that has Email: name#domain.com anyway.
Use find_elements_by_partial_link_text to make the collection.
I think you need 2 XPath. First XPath for finding element that contains text "Email:", second XPath for element that contains attribute "mailto:".
//*[contains(text(),"Email:")]|//*[contains(#href,"mailto:")]