How to get a div with a particular class name - python

I wrote the following script to try to extract between the content from the div tag called markers
phone_category_data = requests.get(phone_category_url)
base_category_soup = soup(phone_category_data.content, "html.parser")
div_list = base_category_soup.find_all("div")
for div in div_list:
if div["class"]:
if div['class'][0] == 'makers':
print div.text

A common way to check for class names when locating elements would be to use:
base_category_soup.find_all("div", class_="makers")
Or, using a CSS selector:
base_category_soup.select("div.makers")
Note that since class is a multi-valued attribute and BeautifulSoup has a special handling for it, both of the approaches would check for any of the class values to be makers, e.g. all of the following would match:
<div class="makers"></div>
<div class="test makers"></div>
<div class="makers test"></div>
<div class="test1 makers test2"></div>

Related

How to find first childs of element without sub-childs by tag name in Selenium (Python) [duplicate]

I'm trying to parse a html file. There are many nested divs in this html. I want to get all child divs, but not grandchildren etc.
Here is a pattern:
<div class='main_div'>
<div class='child_1'>
<div class='grandchild_1'></div>
</div>
<div class='child_2'>
...
...
</div>
So the command I'm looking for would return 2 elements - divs which classes are 'child_1' and 'child_2'.
Is it possible?
I've tried to use main_div.find_elements_by_tag_name('div') but it returned all nested divs in the div.
Here is a way to find the direct div children of the div with class name "main_div":
driver.find_elements_by_xpath('//div[#class="main_div"]/div')
The key here is the use of a single slash which would make the search inside the "main_div" non-recursive finding only direct div children.
Or, with a CSS selector:
driver.find_elements_by_css_selector("div.main_div > div")

How to get links within a subelement in selenium?

I have the following html code:
<div id="category"> //parent div
<div class="username"> // n-number of elements of class username which all exist within parent div
<a rel="" href="link" title="smth">click</a>
</div>
</div>
I want to get all the links witin the class username BUT only those within the parent div where id=category. When I execute the code below it doesn´t work. I can only access the title attribute by default but can´t extract the link. Does anyone have a solution?
a = driver.find_element_by_id('category').find_elements_by_class_name("username")
links = [x.get_attribute("href") for x in a]
Use the following css selector which will return all the anchor tags.
links = [x.get_attribute("href") for x in driver.find_elements(By.CSS_SELECTOR,"#category > .username >a")]
Or
links = [x.get_attribute("href") for x in driver.find_elements_by_css_selector("#category > .username >a")]

python beatifull soup, find only exact class matches

hello so i have this little script
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
parse = BeautifulSoup(html, 'html.parser')
selector = Selector(text=html)
divs = selector.css('.panel .panel-heading a::attr(href)').getall()
and it works fine but i if div has
<div class="panel grey">
i dont want this to match, only exact matches when div has one div
match only this
<div class="panel">
i tried using decompsoe() function but didnt worked in my case, what is the best solution i have my script done this is the only issue
so in short, find children of div only if div has one class
To strictly match div with class equals to panel value, not just some element that contains panel class attribute you can write that explicitly.
Instead of
divs = selector.css('.panel .panel-heading a::attr(href)').getall()
try using
divs = selector.css('div[class="panel"] .panel-heading a::attr(href)').getall()

How to target a specific div class within a specific parent div class, python selenium

I have two divs that look like this:
<div id="RightColumn">
<div class="profile-info">
<div class= "info">
</div>
<div class="title">
</div>
</div>
</div>
How do I target the internal div labelled "title"? It appears multiple times on the page but the one that I need to target is within "RightColumn".
Here is the code I tried:
mainDIV = driver.find_element_by_id("RightColumn")
targetDIV = mainDIV.find_element_by_xpath('//*[#class="title"]').text
Unfortunately the above code still pulls all title divs on the page vs the one I need within the mainDiv.
//div[#id='RightColumn']//child::div[#class='title']
this should get the job done.
first use id RightColumn to taget div and then title class div is a child.
This will select the first title div under this element:
mainDIV.find_element_by_xpath('.//div[#class="title"]
However, this will select the first title on the page:
mainDIV.find_element_by_xpath('//div[#class="title"]
Try:
targetDIV = mainDIV.find_element_by_xpath('.//div[#class="title"]').text
Note as of Selenium 4.0.0, the find_element_by_* functions are deprecated and should be replaced with find_element().
targetDIV = mainDIV.find_element(By.XPATH, './/div[#class="title"]').text
Reference:
WebDriver API - find_element_by_xpath

Extract data-content from span tag in BeautifulSoup

I have such HTML code:
<li class="IDENTIFIER"><h5 class="hidden">IDENTIFIER</h5><p>
<span class="tooltip-iws" data-toggle="popover" data-content="SOME TEXT">
other text</span></p></li>
And I'd like to obtain the SOME TEXT from the data-content.
I wrote
target = soup.find('span', {'class' : 'tooltip-iws'})['data-content']
to get the span, and I wrote
identifier_elt= soup.find("li", {'class': 'IDENTIFIER'})
to get the class, but I'm not sure how to combine the two.
But the class tooltip-iws is not unique, and I would get extraneous results if I just used that (there are other spans, before the code snippet, with the same class)
That's why I want to specify my search within the class IDENTIFIER. How can I do that in BeautifulSoup?
try using css selector,
soup.select_one("li[class='IDENTIFIER'] > p > span")['data-content']
Try using selectorlib, should solve your issue, comment if you need further assistance
https://selectorlib.com/

Categories