Find specific number of things with Beautiful soup - python

I know that the find() command finds only the first occurrence and that find_all() finds all of them. Is there a way to find a specific number?
If i want to find only the first two occurrences is there a method for that or does that need to be resolved in a loop?

You can use CSS selectors knowing the child position you need to extract. Let's assume the HTML you have is like this:
<div id="id1">
<span>val1</span>
<span>val2</span>
<span>val2</span>
</div>
Then you can select the first element by the following:
child = div.select('span:nth-child(1)')
Replace 1 by the number you want
If you want to select multiple occurrences, you can concatenate the children like this:
child = div.select('span:nth-child(1)') + div.select('span:nth-child(2)')
to get the first two children
nth-child selector can also get you the odd number of occurrences:
child = div.select('span:nth-child(2n+1)')
where n starts from 0:
n: 0 => 2n+1: 1
n: 1 => 2n+1: 3
..
Edited after addressing the comment, thanks!

If you are looking for first n elements:
As pointed out in comments, you can use find_all to find all elements and then select necessary amount of it with list slices.
soup.find_all(...)[:n] # get first n elements
Or more efficiently, you can use limit parameter of find_all to limit the number of elements you want.
soup.find_all(..., limit = n)
This is more efficient because it doesn't iterate through the whole page. It stops execution after reaching to limit.
Refer to the documentation for more.
If you are looking for the n(th) element:
In this case you can use :nth-child property of css selectors:
soup.select_one('span:nth-child(n)')

Related

How to search for an element inside another one with selenium

I am searching for elements that contains a certain string with find_elements, then for each of these elements, I need to ensure that each one contain a span with a certain text, is there a way that I can create a for loop and use the list of elements I got?
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
I believe you should be able to search within each element of a loop, something like this:
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
for element in today:
span = element.find_element(By.TAG_NAME, "span")
if span.text == "The text you want to check":
...do something...
Let me know if that works.
Sure, you can.
You can do something like the following:
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
for element in today:
span = element.find_elements(By.XPATH,'.//span[contains(text(),"the_text_in_span")]')
if !span:
print("current element doesn't contain the desired span with text")
To make sure your code doesn't throw exception in case of no span with desired text found in some element you can use find_elements method. It returns a list of WebElements. In case of no match it will return just an empty list. An empty list is interpreted as a Boolean false in Python while non-empty list is interpreted as a Boolean true.

List Append Creates two-dimensional instead of one-dimensional List from Beautifulsoup.find_all

I am parsing a website with beautifulsoup in python, and after finding all elements, I want to strip the digits from the result list and add them to a list:
## find all prices on page
prices = soup.find_all("div", class_="card-footer")
#print(prices)
## extract digits
stripped = [] # declare empty list
for p in prices:
print(p.get_text(strip=True))
stripped.append(re.findall(r'\d+', p.get_text(strip=True)))
print(stripped)
Result:
[['555'], ['590'], ['599'], ['1000'], ['5000'], ['5000'], ['9999'], ['10000'], ['12000']]
How do I have to do it, to end up with a one-dimensional list only?
Since I only need the "stripped" list, maybe there is also an easier way to extract digits other than using re.findall and do it directly in the line prices = soup.find_all("div", class_="card-footer")?
Thanks!
find_all returns a list. Therefore, if you're only interested in the first element (there probably is only one in your case) then:
stripped.append(re.findall(r'\d+', p.get_text(strip=True))[0])

Iterate through list of elements and keep context of parent element in xpath selection

I have a list of elements which i retrieve through find_elements_by_xpath
results = driver.find_elements_by_xpath("//*[contains(#class, 'result')]")
Now I want to iterate through all the elements returned and find specific child elements
for element in results:
field1 = element.find_elements_by_xpath("//*[contains(#class, 'field1')]")
My problem is that the context for the xpath selection gets ignored in the iteration so field1 always just returns the first element with the field1 class on the page regardless of the current element
As #Andersson posted the fix is quite simple, all that was needed was the dot at the beginning of the expression:
for element in results:
field1 = element.find_elements_by_xpath(".//*[contains(#class, 'field1')]")
It's easier to use css selectors (less typing) and find all the elements at once:
for element in driver.find_elements_by_css_selector(".result .field1")
field1 = element

How to nest loop number into an xpath in python?

I have the xpath to follow a user on a website in selenium. Here is what I thought of doing so far:
followloop = [1,2,3,4,5,6,7,8,9,10]
for x in followloop:
driver.find_element_by_xpath("/html/body/div[7]/div/div/div[2]/div[>>>here is where i want the increment to increase<<<]/div[2]/div[1]/div/button").click()
So where I stated in the code is where I want the number to go up in increments. Also as you see with the for loop Im' doing 1,2,3,4,5...can I code it more simply to be like 1-100? Because I don't just want 1-10 I will want it 1-whatever higher number.
I tried to just put x in where I want it but realised that python won't pick up that it's a variable I want to put in that slot and will just consider it as part of the xpath. So how do I make it put the increasing number variable number in there on each loop?
You need to convert the index from the for loop into a string and use it in your xpath:
follow_loop = range(1, 11)
for x in follow_loop:
xpath = "/html/body/div[7]/div/div/div[2]/div["
xpath += str(x)
xpath += "]/div[2]/div[1]/div/button"
driver.find_element_by_xpath(xpath).click()
Also, there will generally be a neater/better way of selecting an element instead of using the XPath /html/body/div[7]/div/div/div[2]. Try to select an element by class or by id, eg:
//div[#class="a-classname"]
//div[#id="an-id-name"]
I would use a wildcard '%s' for your task and range() as indicated in previous comments:
for x in range(0,100):
driver.find_element_by_xpath("/html/body/div[7]/div/div/div[2]/div[%s]/div[2]/
div[1]/div/button").click() % x
Use a format string.
And use range() (or xrange() for larger numbers) instead. It does exactly what you want.
for x in range(10):
driver.find_element_by_xpath("/html/body/div[7]/div/div/div[2]/div[%d]/div[2]/div[1]/div/button" % (x,)).click()

How do I access the first item(or xth item) in a PyQuery query?

I have a query for a one of my tests that returns 2 results.
Specifically the 3rd level of an outline found using
query = html("ul ol ul")
How do I select the first or second unordered list?
query[0]
decays to a HTMLElement
list(query.items())[0]
or
query.items().next() #(in case of the first element)
is there any better way that I can't see?
note:
query = html("ul ol ul :first")
gets the first element of each list not the first list.
From the PyQuery documentation on traversing, you should be able to select the first unordered list by using:
query('ul').eq(0)
Thus the second unordered list can be obtained by using:
query('ul').eq(1)
In jQuery one would use
html("ul ol ul").first()
.first() - Reduce the set of matched elements to the first in the set.
or
html("ul ol ul").eq(0)
.eq() - Reduce the set of matched elements to the one at the specified index.

Categories