Get xpath sibling text value - Python Selenium - python

Requirement:
/html/body/div[3]/div[4]/div/div[7]/div/div/div/div/p/b - Contains word "TITLE"
/html/body/div[3]/div[4]/div/div[8]/div/div/div/div/p - Contains "This is my description"
Actual HTML:
<div class="secadvheading section">
<div class="section-custom">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<p class="mt-15"><b>TITLE</b></p>
</div>
</div>
</div>
</div>
</div>
<div class="paragraphText parbase section">
<div class="section-custom ">
<div class="container-fluid">
<div class="row">
<div class="col-md-12">
<p>This is my desciption</p>
</div>
</div>
</div>
</div>
Question:
How to get text content paragraph text after "TITLE" div?
Tried
driver.find_element_by_xpath("//*[contains(text(),'TITLE')]/following-sibling::p")
didn't worked. I may have multiple "TITLE in same page" how can i gracefully look for TITILE div (multiple elements) and get the description for the same?

You need to go out of TITLE's node first--go to ancestor node the use following-sibling. Try this:
//b[text()='TITLE']/ancestor::div[#class='secadvheading section']/following-sibling::div[#class='paragraphText parbase section']//p

Related

Best approach to get attribute text with BeautifulSoup

What would be the best way to get the text of the items class="field__label" y class="field__item" in the following code
Taking into consideration that there are other tags with the same class outside the div class="fieldset-wrapper" I just need the ones inside this tag.
HTML Example:
<div class="fieldset-wrapper">
<div class="field field--name-field-adresse-strasse-nr field--type-string field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
<div class="field field--name-field-adresse-plz-ort field--type-string field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
<div class="field field--name-field-adressen-bundesland field--type-entity-reference field--label-inline clearfix">
<div class="field__label">TEXT</div>
<div class="field__item">TEXT</div>
</div>
</div>
You can use css selectors to ensure that your target elements are descendants of the div class="fieldset-wrapper" element:
for item in soup.select('div.fieldset-wrapper div.field__item, div.fieldset-wrapper div.field__label'):
print(item.text)

Python Selenium get nth parent and nth child

i am looking for a solution to get the second parent (div) of a known element and then get a child element with sub-childrens in selenium with python.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
<div class="p1">
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<span class="target">b_Kunden</span>
</div>
</div>
<div class="p2">...</div>
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<div class="p4">
<div class="p5">
<div class="p6">
<button type="button" class="b1">button i want to click</button>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="p1">
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<span class="target">different</span>
</div>
</div>
<div class="p2">...</div>
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<div class="p4">
<div class="p5">
<div class="p6">
<button>some button</button>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="p1">
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<span class="target">different</span>
</div>
</div>
<div class="p2">...</div>
<div class="p2">...</div>
<div class="p2">
<div class="p3">
<div class="p4">
<div class="p5">
<div class="p6">
<button>some button</button>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>
I am getting the element I am looking for with xpath and checking for a specific text ("b_Kunden")(working fine):
temp = browser.find_elements_by_xpath("//*[contains(text(), 'b_Kunden')]")[0]
** I cannot access it trough simple className etc. This is just
dummy HTML in environment I need to go up to p1 class from where span text is "b_Kunden" and then down to child element p6 to click the button inside it **
The reason for this is I need to press exactly the button of the section where the span text is b_Kunden. Because the amount of the sections is variable I cannot count and access them trough example: [1] operator for classes. I need to find the term "b_Kunden" and press the relate button in the p1 section to it.
I would be glad if someone could help me out on how to solve this issue.
Best regards,
Liam
This might help you (assuming that the 2nd parent you referred to is going backwards in heirarchy (higher up the DOM):
browser.find_elements_by_xpath("//*[contains(text(), 'b_Kunden')][1]//parent::div//parent::div")
UPDATE (code per the comment and html block provided by #Liam)
//*[text()='b_Kunden']//ancestor::div//button
Here is the HTML with DOM highlighted to the button. I hope this is what you are looking for.
HTML DOM highlighted snapshot

Find xpath with following-siblings and contains text in Python Selenium

I know the very basics of using following-siblings but here I have a situation where it looks a bit more complicated.
I want to find the element with text Total 6.5 where the header is Total games. How can I do it with following-siblings and contains text?
<div class="group">
<div class="header_1">
<div class="section_1">
<div class="expander"></div>
<div class="star"></div>
<div class="text_3">Total games</div>
</div>
</div>
<div class="body_1">
<div class="horizontal">
<div class="grid">
<div class="row_common">
<div class="cell_wrap">
<div class="cell_align_wrap">
<div class="common_text">Total 6.5</div>
</div>
</div>
</div>
<div class="row_common">
...
</div>
</div>
</div>
</div>
</div>
This one should locate required element
//div[#class="header_1" and contains(., "Total games")]/following-sibling::div[#class="body_1"]//div[#class="common_text"]
you can also simplify it as
//div[#class="header_1" and contains(., "Total games")]/following::div[#class="common_text"]

Extract table from html including images using Python

I am trying to download a table from html which is not in the usual td/ tr format and includes images.
The html code looks like this:
<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>
I would like to get the table:
[Location 45 out of fifty points,
Service 45 out of fifty points].
The following code only prints "Location" and "Service" and does not include the rating.
for url in urls:
r=requests.get(url)
time.sleep(delayTime)
soup=BeautifulSoup(r.content, "lxml")
data17= soup.findAll('div', {'class' :'dynamicBottom'})
for item in (data17):
print(item.text)
And the code
data18= soup.find(attrs={'class': 'sprite-rating_s_fill rating_s_fill s45'})
print(data18["alt"] if data18 else "No meta title given")
does not help either since it is not clear which rating it represents since it only prints out "45 out of fifty points" but it is not clear for which category. Additionally, the image tag ('sprite-rating_s_fill rating_s_fill s45') varies in other tables depending on the rating.
Is there a way to extract the full table?
Or to tell Python to extract the image after a certain word, e.g. "Location"?
Thank you very much for your help!
html = '''<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<div class="improve_listing_btn ui_button primary small">improve this entry</div>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="ratingRow wrap"):
text = div.text.strip()
alt = div.find('img').get('alt')
print(text, alt)
out:
Location 45 out of fifty points
Service 45 out of fifty points

wrapping html with a python function

I want to be able to wrap a div based on it's id. For example given the following HTML:
<body>
<div id="info">
<div id="a1">
</div>
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</body>
I want to write a Python function that takes a document, an id, and a selector. and will wrap the given id in the given document in a div with the class or id selector. For example, lets say that the HTML above is in a variable doc
wrap(doc,'#a2','#wrapped')
will return the following HTML:
<body>
<div id="info">
<div id="a1">
</div>
<div id="wrapped">
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</div>
</body>
I looked at some XML parsers and Python HTMLParser, but I have not found anything that gives me the capability to not only get everything inside a specific tag, but then be able to append strings and easily edit the document. If one does not exist, what would be a good approach to this?
from BeautifulSoup import BeautifulSoup
#div1 is to be wrapped with div2
def wrap(doc,div1_id,div2_id)
pool = BeautifulSoup(doc)
for div in pool.findAll('div', attrs={'id':div1_id}):
div.replaceWith('<div id='+div2_id+'>' + div.prettify() + '</div>' )
return pool.prettify()
wrap(doc,'a2','wrapped')
I recommend BeautifulSoup though it will bring some dependency but also a lot convenience. The following code can acheieve the goal of the wrap:
from bs4 import BeautifulSoup
data = '''<body>
<div id="info">
<div id="a1">
</div>
<div id="a2">
<div id="description">
</div>
<div id="links">
link
</div>
</div>
</div>
</body>'''
soup = BeautifulSoup(data)
div = soup.find('div', attrs={'id': 'a2'})
div.wrap(soup.new_tag('div', id='wrapper'))
And then print soup.prettify() we can see the result:
<html>
<body>
<div id="info">
<div id="a1">
</div>
<div id="wrapper">
<div id="a2">
<div id="description">
</div>
<div id="links">
<a href="http://example.com">
link
</a>
</div>
</div>
</div>
</div>
</body>
</html>

Categories