I have the following html code:
<div id="category"> //parent div
<div class="username"> // n-number of elements of class username which all exist within parent div
<a rel="" href="link" title="smth">click</a>
</div>
</div>
I want to get all the links witin the class username BUT only those within the parent div where id=category. When I execute the code below it doesn´t work. I can only access the title attribute by default but can´t extract the link. Does anyone have a solution?
a = driver.find_element_by_id('category').find_elements_by_class_name("username")
links = [x.get_attribute("href") for x in a]
Use the following css selector which will return all the anchor tags.
links = [x.get_attribute("href") for x in driver.find_elements(By.CSS_SELECTOR,"#category > .username >a")]
Or
links = [x.get_attribute("href") for x in driver.find_elements_by_css_selector("#category > .username >a")]
Related
hello so i have this little script
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
parse = BeautifulSoup(html, 'html.parser')
selector = Selector(text=html)
divs = selector.css('.panel .panel-heading a::attr(href)').getall()
and it works fine but i if div has
<div class="panel grey">
i dont want this to match, only exact matches when div has one div
match only this
<div class="panel">
i tried using decompsoe() function but didnt worked in my case, what is the best solution i have my script done this is the only issue
so in short, find children of div only if div has one class
To strictly match div with class equals to panel value, not just some element that contains panel class attribute you can write that explicitly.
Instead of
divs = selector.css('.panel .panel-heading a::attr(href)').getall()
try using
divs = selector.css('div[class="panel"] .panel-heading a::attr(href)').getall()
I have two divs that look like this:
<div id="RightColumn">
<div class="profile-info">
<div class= "info">
</div>
<div class="title">
</div>
</div>
</div>
How do I target the internal div labelled "title"? It appears multiple times on the page but the one that I need to target is within "RightColumn".
Here is the code I tried:
mainDIV = driver.find_element_by_id("RightColumn")
targetDIV = mainDIV.find_element_by_xpath('//*[#class="title"]').text
Unfortunately the above code still pulls all title divs on the page vs the one I need within the mainDiv.
//div[#id='RightColumn']//child::div[#class='title']
this should get the job done.
first use id RightColumn to taget div and then title class div is a child.
This will select the first title div under this element:
mainDIV.find_element_by_xpath('.//div[#class="title"]
However, this will select the first title on the page:
mainDIV.find_element_by_xpath('//div[#class="title"]
Try:
targetDIV = mainDIV.find_element_by_xpath('.//div[#class="title"]').text
Note as of Selenium 4.0.0, the find_element_by_* functions are deprecated and should be replaced with find_element().
targetDIV = mainDIV.find_element(By.XPATH, './/div[#class="title"]').text
Reference:
WebDriver API - find_element_by_xpath
I wrote the following script to try to extract between the content from the div tag called markers
phone_category_data = requests.get(phone_category_url)
base_category_soup = soup(phone_category_data.content, "html.parser")
div_list = base_category_soup.find_all("div")
for div in div_list:
if div["class"]:
if div['class'][0] == 'makers':
print div.text
A common way to check for class names when locating elements would be to use:
base_category_soup.find_all("div", class_="makers")
Or, using a CSS selector:
base_category_soup.select("div.makers")
Note that since class is a multi-valued attribute and BeautifulSoup has a special handling for it, both of the approaches would check for any of the class values to be makers, e.g. all of the following would match:
<div class="makers"></div>
<div class="test makers"></div>
<div class="makers test"></div>
<div class="test1 makers test2"></div>
when I learn BeautifulSoup library and try to crawl a webpage, I can limit the search result by limiting the attributes like: a, class name = user-name, which can be found by inspecting the HTML source.
Here is a success example:
<a href="https://thenewboston.com/profile.php?user=2" class="user-name">
Bucky Roberts </a>
I can easily tell
soup = BeautifulSoup(plain_text,'html.parser')
for link in soup.findAll('a', {'class': 'user-name'}):
However, when I try to get the profile photo's link, I see the code below by inspecting:
<div class="panel profile-photo">
<a href="https://thenewboston.com/profile.php?user=2">
<img src="/photos/users/2/resized/869b40793dc9aa91a438b1eb6ceeaa96.jpg" alt="">
</a>
</div>
In this case the .jpg link has nothing to refer to. Now what should I do to get the .jpg link for each user?
You can use the img element parent elements to create your locator. I would use the following CSS selector that would match img elements directly under the a elements directly under the element having profile-photo class:
soup.select(".profile-photo > a > img")
To get the src values:
for image in soup.select(".profile-photo > a > img"):
print(image['src'])
<a href=”link” class=”link_to_img”>
<img src=”dosen’t matter”></img>
</a>
<span>
<a href=”link” class=”link_to_img”>Title Of Image</a>
</span>
As you can see there are two <a href=”link” class=”link_to_img”></a> when I try to get the link Example: href.findAll('a', 'class': 'link_to_img') it gets the link but it duplicates it and I just need it one time. Is there a way I can target the <a></a> inside the <span></span>
You can use the limit argument in findAll().
find_all("a", limit=1)
You can select your link with a span > a CSS selector which would match an a tag directly inside a span tag:
soup.select("span > a")
You can additionally check the class:
soup.select("span > a.link_to_img")
If you are using the latest beautifulsoup4 package, you can use select_one() to have it return a single Tag instance instead of a list:
soup.select_one("span > a.link_to_img")