In my page to scrape I have some elements like
<a href='xxxx' class='main-class class2 class3'>
end others elements like
<a href='xxxx' class='main-class class4 class5'>
I want to retrieve all these elements, so I did
elems = driver.find_elements_by_xpath("//a[#class='main-class']")
but it doesn't retrieve anything. I can't search on one class only.
For the moment, it is working only if I indicate the three classes.
Thank you
You should use the following XPath:
elems = driver.find_elements_by_xpath("//a[contains(#class,'main-class')]")
This will give you all the elements containing this class
Related
I am trying to extract data from multiple pages of search results where the HTML in question looks like so:
<ul>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
<li class="Card___StyledLi4-ulg8ho-7 jmevwM">...</li>
</ul>
I want to extract the text from the "li" tags, so I have:
text_data = WebDriverWait(driver,10).until(EC.visibility_of_all_element_located((By.XPATH,'Card___StyledLi4-ulg8ho-7.jmevwM')
print(text_data.text)
to wait and target "li" item. However, I get a "TimeoutException" error.
However, if I try to locate a single "li" item using the XPATH under the same conditions, the data is returned which leads me to question if I am inputting the class correctly?
Can anyone tell me what I'm doing wrong? Please let me know if there is any further information, you'd like me to provide.
I believe the XPath for these list items would be //li[#class="Card___StyledLi4-ulg8ho-7 jmevwM"] (or //*[#class="Card___StyledLi4-ulg8ho-7 jmevwM"] if you want all elements with that class rather than just li tags). You can take a look at this cheatsheet and this tutorial for further rules and examples of XPath.
You can also just use CSS Selectors like (By.CSS_SELECTOR, '.Card___StyledLi4-ulg8ho-7.jmevwM') in this case.
You have mentioned the wrong locator type, it should be CSS_SELECTOR and also put a dot '.' in front of element's property, because it is a 'class':
text_data = WebDriverWait(driver,10).until(EC.visibility_of_all_element_located((By.CSS_SELECTOR,'.Card___StyledLi4-ulg8ho-7.jmevwM')
I'm using Selenium to scrape a Web Page and I'm having some problems targeting some attributes.
The page I'm trying to scrape looks like this:
<div>
<span abc> content </span>
<span def> content2 </span>
<div>
My goal would be to retrieve the text within the "span abc" tag, without selecting the other text included in the "span def" tag.
I've tried multiple approaches and looked at a lot of different resources but I wasn't able to find the right approach, since I don't want to select all the spans at the same time and I don't want to search based on the text within the tags.
A simple approach would be indexing cause you do not want to select based on
since I don't want to select all the spans at the same time and I
don't want to search based on the text within the tags.
If abc is an attribute please use :
//div/span[#abc]
or
with indexing :
(//div/span[#abc])[1]
If you only want to pull the first span out of these two, you could easily do this with the XPATH. It would look like this:
span = driver.find_element_by_xpath("/html/body/div/span[1]").text
if you want to pull every span, but execute commands with each of these you could do:
span = len(driver.find_elements_by_xpath("/html/body/div/span"))
m = 1
while m <= 0:
span = driver.find_element_by_xpath("/html/body/div/span["+str(m)+"]")
print(span.text)
m = m + 1
You can use xpath like //span[1]/text() for get text inside of the <span> tag
span = driver.find_element_by_xpath("/html/body/div/span[1]/text()")
I have an HTML code like this:
<span class="twitter-label">
Connect Your Twitter Account
</span>
and
<span class="twitter-label">
Follow
</span>
How can I take the second class name?
You can use driver.find_elements with "By.CLASS_NAME":
from selenium.webdriver.common.by import By
lst = driver.find_elements(By.CLASS_NAME, 'twitter-label')
This gives you the list of span elements whose class name is "twitter-label". You can have the second element in the list with lst[1] and it's text with lst[1].text, or click it with lst[1].click().
If you are not sure that it's the second element with taht specification, you can also check it's text or use "By.XPATH" to consider if it contains 'follow' in it's text.
You can use the .find_elements_by_class_name("twitter-label"), it's plural, it will return a list of the elements found. So you can access which one want:
To access the Connect use:
driver.find_elements_by_class_name("twitter-label")[0].click()
To access the Follow use:
driver.find_elements_by_class_name("twitter-label")[1].click()
Edit:
The find_elements_by_* functions are deprecated, you should update to find_elements().
You want to find a class, so you need to pass that as the first argument to the function, and then the name of the class you want to find.
So it should look like this:
driver.find_elements(By.CLASS_NAME, "twitter-label")[0].click()
i'm trying to webscrape the span from a button that has a determinated class. This is the code of the page on the website.
<button class="sqdOP yWX7d _8A5w5 " type="button">altri <span>17</span></button>
I'd like to find "17" that obviously changes everytime. Thanks.
I've tried with this one but it doesn't work
for item in soup.find_all('button', {'class': 'sqdOP yWX7d _8A5w5 '}):
For complex selections, it's best to use selectors. These work very similar to CSS.
p selects an element with the type p.
p.example selects an element with type p and class example.
p span selects any span inside a p.
There are also others, but only these are needed for this example.
These can be nested as you like. For example, p.example span.foo selects any span with class foo inside any p with class example.
Now, an element can have multiple classes, and they are separated by spaces. <p class="foo bar">Hello, World!</p> has both foo and bar as class.
I think I am safe to assume the class sqdOP is unique. You can build the selector pretty easily using the above:
button.sqdOP span
Now, issue select, and BeautifulSoup will return a list of matching elements. If this is the only one, you can safely use [0] to get the first item. So, the final code to select that span:
soup.select('button.sqdOP span')[0]
I am looking to retrieve the value "312 votes" from the below tag hierarchy:
<div class="rating-rank right">
<span class="rating-votes-div-65211">312 votes</span>
</div>
The problem seems to be that the span tag has a unique identifier for every values in the page. In the above case '65211'. What should i do to retrieve the required value?
I am using soup.select to get the values. But it doesn't seem to work.
for tag in soup.select('div.rating-rank right'):
try:
print(tag.string)
except KeyError:
pass
You try to select a right element that follows a div with class rating-rank. You can select what you want like this:
soup.select("div.rating-rank.right span")
With css selectors you have to read them from right to left. So div.rating-rank.right span means I want a span element which is after a div element having rating-rank, right as classes. From the moment you identified your span elements, you can print their contents like you already do.