I have a page structure similar to this:
<html>
<head/>
<frameset>
<frame/>
<frameset id="id1">
<frame/>
<frame id="id2">
<html>
<head/>
<body class="class1">
<form id="id3">
<input/>
<input/>
<input/>
<input/>
<table/>
<table/>
<table/>
<div id="id4">
<div id="id5">
<table id="id6">
<thead/>
<tbody>
<tr/>
<tr/>
<tr/>
<tr>
<td/>
<td/>
<td>
Text
I need to click on the dynamic link - the link and position inside the table varies, but the text is always the same.
I've tried using find_element_by_link_text and it fails.
Using xpath it can not find the form element.
Thank you.
You need to switch to the frame containing the <a> element first. Your code would look something like this:
driver.switch_to_frame('id3')
driver.find_element_by_link_text('TEXT').click()
Note that the above code is only an approximation, since your provided HTML code is only an approximation. In particular, you have a <frameset> element as a direct child of another <frameset> element, which I believe is invalid HTML. If you indeed have nested framesets, you'll need multiple calls to switch_to_frame to navigate down the frame hierarchy until your focus is on the frame containing the document with the element you're looking for.
You can first find all a tags in the page using:
find_elements_by_tag_name
Then iterate over each a tag and check its text since text is always the same
a_tags = driver.find_elements_by_tag_name('a')
for a in a_tags:
if a.text == 'TEXT':
a.click()
Related
i have to make some automation on a page.
The page consists of table where inside each td element i have 2 a tags, the first one with a class, the second one has no class or id.
i can easily select the one with the class, but how to get the other one? is there a way to select the element next to another one like in css?
this is a draft of the structure of the page
<table>
<tr>
<td>
<a class="mylink"> element 1 </a>
<a>
<img src="">
</a>
</td>
</tr>
<tr>
<td>
<a class="mylink"> element 2 </a>
<a>
<img src="">
</a>
</td>
</tr>
</table>
I can select the first one with
fileLinkClass = "mylink"
driver.find_element(by=By.CLASS_NAME, value=fileLinkClass)
but i need to select and click the a link without the class. How can i accomplish this?
Thank you so much
You can use xpath selector
'//td/a[2]'
to find all second 'a's under a 'td'
Try using css selector
For single element selection
driver.find_element(By.CSS_SELECTOR,'.mylink + a')
For multiple elements selection
driver.find_elements(By.CSS_SELECTOR,'.mylink + a')
Make a list slicing then click. For example:
element = driver.find_elements(By.CSS_SELECTOR,'.mylink + a')
element = element[0].clik()
element = element[1].clik()
I am trying to scrape Instagram page, and want to get/access div-tags present inside of span-tag. but I can't! the HTML of the Instagram page looks like as
<head>--</head>
<body>
<span id="react-root" aria-hidden="false">
<form enctype="multipart/form-data" method="POST" role="presentation">…</form>
<section class="_9eogI E3X2T">
<main class="SCxLW o64aR" role="main">
<div class="v9tJq VfzDr">
<header class=" HVbuG">…</header>
<div class="_4bSq7">…</div>
<div class="fx7hk">…</div>
</div>
</main>
</section>
</body>
I do, it as
from bs4 import BeautifulSoup
import urllib.request as urllib2
html_page = urllib2.urlopen("https://www.instagram.com/cherrified_/?hl=en")
soup = BeautifulSoup(html_page,"lxml")
span_tag = soup.find('span') # return span-tag correctly
span_tag.find_all('div') # return empty list, why ?
please also specify an example.
Instagram is a Single Page Application powered by React, which means its source is just a simple "empty" page that loads JavaScript to dynamically generate the content in the browser after downloading.
Click "View source" or go to view-source:https://www.instagram.com/cherrified_/?hl=en in Chrome. This is the HTML you download with urllib.request.
You can see that there is a single <span> tag, which does not include a <div> tag. (Note: <div> inside a <span> is not allowed).
Scraping instagram.com this way is not possible. It also might not be legal (I am not a lawyer).
Notes:
your HTML code example doesn't include a closing tag for <span>.
your HTML code example doesn't match the link you provide in the python snippet.
in the last line of the python snippet you probably meant span_tag.find_all('div') (note the variable name and the singular 'div').
XPath via lxml in Python has been making me run in circles. I can't get it to extract text from an HTML table despite having what I believe to be the correct XPath. I'm using Chrome to inspect and extract the XPath, then using it in my code.
Here is the HTML table taken directly from the page:
<div id="vehicle-detail-model-specs-container">
<table id="vehicle-detail-model-specs" class="table table-striped vdp-feature-table">
<!-- Price -->
<tr>
<td><strong>Price:</strong></td>
<td>
<strong id="vehicle-detail-price" itemprop="price">$ 2,210.00</strong> </td>
</tr>
<!-- VIN -->
<tr><td><strong>VIN</strong></td><td> *0343</td></tr>
<!-- MILEAGE -->
<tr><td><strong>Mileage</strong></td><td>0 mi</td></tr>
</table>
I'm trying to extract the Mileage. The XPath I'm using is:
//*[#id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]
And the Python code that I'm using is:
page = requests.get(URL)
tree = html.fromstring(page.content)
mileage = tree.xpath('//*[#id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]')
print mileage
Note: I've tried adding /text() to the end and I still get nothing back, just an empty list [].
What am I doing wrong and why am I not able to extract the table value from the above examples?
As Amber has pointed out, you should omit the tbody part.
You use tbody in your xpath when there is no <tbody> tag in the html code for your table.
Using the html you posted, I am able to extract the mileage value with the following xpath:
tree.xpath('//*[#id="vehicle-detail-model-specs"]/tr[3]/td[2]')[0].text_content()
I am trying to use Python Selenium Firefox Webdriver to grab the h2 content 'My Data Title' from this HTML
<div class="box">
<ul class="navigation">
<li class="live">
<span>
Section Details
</span>
</li>
</ul>
</div>
<div class="box">
<h2>
My Data Title
</h2>
</div>
<div class="box">
<ul class="navigation">
<li class="live">
<span>
Another Section
</span>
</li>
</ul>
</div>
<div class="box">
<h2>
Another Title
</h2>
</div>
Each div has a class of box so I can't easily identify the one I want. Is there a way to tell Selenium to grab the h2 in the box class that comes after the one that has the span called 'Section Details'?
If you want grab the h2 in the box class that comes after the one that has the span with text Section Details try below xpath using preceding :-
(//h2[preceding::span[normalize-space(text()) = 'Section Details']])[1]
or using following :
(//span[normalize-space(text()) = 'Section Details']/following::h2)[1]
and for Another Section just change the span text in xpath as:-
(//h2[preceding::span[normalize-space(text()) = 'Another Section']])[1]
or
(//span[normalize-space(text()) = 'Another Section']/following::h2)[1]
Here is an XPath to select the title following the text "Section Details":
//div[#class='box'][normalize-space(.)='Section Details']/following::h2
yeah, you need to do some complicated xpath searching:
referenceElementList = driver.find_elements_by_xpath("//span")
for eachElement in referenceElementList:
if eachElement.get_attribute("innerHTML") == 'Section Details':
elementYouWant = eachElement.find_element_by_xpath("../../../following-sibling::div/h2")
elementYouWant.get_attribute("innerHTML") should give you "My Data Title"
My code reads:
find all span elements regardless of where they are in HTML and store them in a list called referenceElementList;
iterate all span elements in referenceElementList one by one, looking for a span whose innerHTML attribute is 'Section Details'.
if there is a match, we have found the span, and we navigate backwards three levels to locate the enclosing div[#class='box'], and find this div element next sibling, which is the second div element,
Lastly, we locate the h2 element from its parent.
Can you please tell me if my code works? I might have gone wrong somewhere navigating backwards.
There is potential difficulty you may encounter, the innerHTML attribute may contain tab, new line and space characters, in that case, you need regex to do some filtering first.
I am currently working on web automation via Selenium.
I have a html file where the relevant part is this:
<table>
<tbody>
<tr>
<td class="tabon" nowrap="">
<div class="tabon">
<a id="tab" href="(long dynamically generated string)">
<b>Main Page</b>
</a>
</div>
</td>
<td class="taboff" nowrap="">
<div class="taboff">
<a id="tab" href="(another long string)">Info</a>
</div>
</td>
</tr>
</tbody>
</table>
I want to be able to access the second tab. Using Selenium I can't actually "click" on the div tag.
try:
browser.find_element_by_xpath(
'//table/tbody/tr/td[2]/div/a').click()
except NoSuchElementException:
print ('error')
This always results in an error. It has something to do with the fact that when the div tag is interacted with, it clicks on the URL anchor which changes the div such that the clicked on tag has a "tabon" property. How can Selenium mimic this?
EDIT: I neglected to note that the class with "tabon" has the title of the page in a separate bold tag.
Try this code, in case the tab "My Info" is visible on the webpage:
browser.find_element_by_xpath("//a[.='My Info']").click()
This will click on the element with tag 'a' and having exact innerHTML/text as My Info.
You need to be passing click on a tag not on div and in addition to the solution Subh provided you can use .taboff a as cssselector. This selector walks you down to a tag from second td of pasted html