I'm trying to scrap in a website using Selenium and Python, I got stucked in a field called 'textarea' by the website. This is how the website HTML calls the area where I'm trying to extract text:
<textarea class="script" onclick="this.focus();this.select()" readonly="readonly" id="script">
After this code comes the text that I want to get. Here is the code that I'm using:
getCode = driver.find_elements_by_tag_name('textarea')
My problem is that It does not recognize the text by the following codes:
getCode.submit()
getCode.click()
getCode.text()
This is the code error that I always get:
Traceback (most recent call last):
File "ScprL.py", line 55, in module
print (repr(getCode.text))
AttributeError: 'list' object has no attribute 'text'
I would apreciate your help!
You should use driver.find_element_by_tag_name instead
When you use driver.find_elements you get a list of webElements. You should extract the element from the list
elem = driver.find_elements_by_tag_name('textarea')[0]
print element.text
If you have multiple textareas on the page, then you should try to finding the one you need like below
textareas = driver.find_elements_by_tag_name('textarea')
for i, textarea in enumerate(textareas):
print '{} is at index {}'.format(textarea.text, i)
And then use the appropriate i value to get textareas[i]
As you are using driver.find_elements_by_tag_name('textarea') it will retrieve list of web elements. Need to collect these web elements then iterate one by one then get text of each of web element. below is the example in java,
List<WebElement> ButtonNamelist = driver.findElements(By.cssSelector(".locatorHere"));
System.out.println(ButtonNamelist.size());
for(int i=0;i<ButtonNamelist.size();i++){
System.out.println(ButtonNamelist.get(i).getText());
}
Thank You,
Murali
There are two functions for each locator in selenium: "find_elements" and "find_element". The difference is pretty simple: the first one return a list of elements that satisfy selector and the second one returns first found element. You can read more about locating elements here.
So you either need to change your function to find_element_by_tag_name or to retrieve first element from list: find_element_by_tag_name()[0].
Related
Im doing some scrapping with selenium Python, my problem is that, when I call WebElement.text() it gives me a string in one line with no format. But I want to get that text just as the web shows, that is, with the line breaks.
For example, the element with text:
<br>'Hello this is an example'<br>
In the web it shows as:
<br>
'Hello this is an<br>
example'
I want the second result, but Selenium gives me the first one. I tried to 'manually' give format to the text using the width of the words with PIL, but the results are quite unexact.
Instead of using the text attribute, you need to use the get_attribute("innerHTML") as follows:
print(WebElement.get_attribute("innerHTML"))
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
I'm trying to print and/or write to file text inside a span tag from the following HTML. Only want it to find_element once, not find_elements since there's only one instance:
<div>
<span class="test">2</span>
</div>
Below is the python code I'm using that is generating the "'WebElement' object is not iterable" error.
test = driver.find_element_by_xpath("/html/body/div")
for numberText in test:
numberTexts = numberText.find_element_by_class_name("test")
print(numberTexts.txt)
You're getting a single element (first one) by:
driver.find_element_by_xpath("/html/body/div")
which is obviously not iterable.
For multiple elements i.e. to get an iterable, use:
driver.find_elements_by_xpath("/html/body/div")
Note the s after element.
Also check out the documentation.
single element will not be iterable. Try find_elements_by_xpath (pluralize element).
If there is only one instance, then simply use it without the for loop.
the error is pretty clear...
the return value of find_element_by_xpath is a WebElement. You can't iterate a WebElement...
you have several other mistakes in your example code. You could rewrite the entire block of code as:
element = driver.find_element_by_class_name("test")
print(element.text)
I am using the following code using Python 3.6 and selenium:
element = driver.find_element_by_class_name("first_result_price")
print(element)
on the website it is like this
`website: span class="first_result_price">712
however if I print element I get a completely different number?
Any suggestions?
many thanks!!
"element" is a type of object called WebElement that Selenium adds. If you want to find the text inside that element, you have to say
element.text
Which should return what you're looking for, '712', albeit in string form.
How can I get an element at this specific location:
Check picture
The XPath is:
//*[#id="id316"]/span[2]
I got this path from google chrome browser. I basically want to retreive the number at this specific location with the following statement:
zimmer = response.xpath('//*[#id="id316"]/span[2]').extract()
However I'm not getting anything but an empty string. I found out that the id value is different for each element in the list I'm interested in. Is there a way to write this expression such that it works for generic numbers?
Use the corresponding label and get the following sibling element containing the value:
//span[. = 'Zimmer']/following-sibling::span/text()
And, note the bonus to the readability of the locator.
I'm trying to pull out the first URL in a list of URL tags using beautifulsoup and am getting hung up. So far I have been able to get the results that I'm looking for using the following bit of code.
rows = results.findAll('p',{'class':'row'})
for row in rows:
for link in row.findAll('a'):
print(link)
This prints three <a> tags similar to the following.
1
2
3
What I am looking to do is to extract out just the URL from the first a href. I found another post that describes doing this with some regex but so far I haven't been able to get that to work correctly.
I keep getting this error message:
Traceback (most recent call last):
File "./scraper.py", line 25, in <module>
for link in row.find('a', href=re.compile('^http://')):
TypeError: 'NoneType' object is not iterable
Any help or direction would be appreciated. Let me know what other details I need to post.
You don't need to use findAll if you only want the first result - you can use find.
Html attributes are exposed as a dictionary in BeautifulSoup.
Finally, if the second argument to find is a string instead of a dict, it's used as the class. You could also provide it as a named argument: find('p', class='row').
Knowing this, you can accomplish what you want with a simple line:
results.find('p','row').find('a')['href']