I am trying to get an link from another frame on our webpage. After using
Select Frame css=frame[name'submenu']
Click Link css=#navigation_user > tr:nth-child(2) > td:nth-child(1) > a:nth-child(1) > span:nth-child(1)
I would have directly used the link=users BUT there is another link with the same link and it is a parent element of it so I can't use it.
Any ideas on how can I access this link?
Excerpt of the html:
<frame src="sample.asp" name="submenu">
<tbody>
<tr>
<td class="navigation">
<a class="navigationheadline">
<span id="user" class="navigationheadline">Users</span>
</a>
</td>
</tr>
</tbody>
<tbody id="navigation_user">
<tr>
<td class="navigation">
<a href="UserSearch.asp?null=" target="main">
<span class="navigation" onclick="hideNv()">Users</span>
</a>
</td>
</tr>
</tbody>
</frame>
Thank you in advance!
have you tried a different locator method? like xpath or jquery? I think the route of your problem is the locator you're using. How did you determine the one you're using is correct? Just plucked out of dev tools?
xpath=//*[#id="navigation_user"]/tr/td/a/span
Related
This is the html I have on a website:
<table class="table table-fixed table-header-right text-medium">
<tbody><tr><th class="no-border">Certification Number</th><td class="no-border">48487270</td></tr>
<tr>
<th>Label Type</th>
<td>
<img width="69" height="38" class="margin-right-min" alt="" aria-hidden="true" src="https://i.psacard.com/psacard/images/cert/table-image-ink.png" style="">
<span class="inline-block padding-top-min">with fugitive ink technology</span>
</td>
</tr>
<tr><th>Reverse Cert Number/Barcode</th><td>Yes</td></tr>
<tr><th>Year</th><td>2020</td></tr>
<tr><th>Brand</th><td>TOPPS</td></tr>
<tr><th>Sport</th><td>BASEBALL CARDS</td></tr>
<tr><th>Card Number</th><td>20</td></tr>
<tr><th>Player</th><td>ARISTIDES AQUINO</td></tr>
<tr><th>Variety/Pedigree</th><td></td></tr>
<tr><th>Grade</th><td>NM-MT 8</td></tr>
</tbody></table>
I am trying to figure out a way to get and set the year to a variable, the normal way I find elements is with XPath but since these tags are repeated so many times with no other indicators I am unsure how to go about this. The year will change so I cant search by text. Any help would be appreciated.
Use BeautifulSoup to find the <th> tag with the text 'Year'. Then find the next <td> tag and extract the text from that:
from bs4 import BeautifulSoup
html = '''<table class="table table-fixed table-header-right text-medium">
<tbody><tr><th class="no-border">Certification Number</th><td class="no-border">48487270</td></tr>
<tr>
<th>Label Type</th>
<td>
<img width="69" height="38" class="margin-right-min" alt="" aria-hidden="true" src="https://i.psacard.com/psacard/images/cert/table-image-ink.png" style="">
<span class="inline-block padding-top-min">with fugitive ink technology</span>
</td>
</tr>
<tr><th>Reverse Cert Number/Barcode</th><td>Yes</td></tr>
<tr><th>Year</th><td>2020</td></tr>
<tr><th>Brand</th><td>TOPPS</td></tr>
<tr><th>Sport</th><td>BASEBALL CARDS</td></tr>
<tr><th>Card Number</th><td>20</td></tr>
<tr><th>Player</th><td>ARISTIDES AQUINO</td></tr>
<tr><th>Variety/Pedigree</th><td></td></tr>
<tr><th>Grade</th><td>NM-MT 8</td></tr>
</tbody></table>'''
soup = BeautifulSoup(html, 'html.parser')
year = soup.find('th', text='Year').find_next('td').text
print(year)
Output:
'2020'
Firstly we need to find out webelements using driver.findelements function using that classname
And then we can get elements from that list
By list.get(index)
Or,
You can store all the td/th elements in a list and than search the list for year you are looking for.
I am trying to scrape some data off a website. The data that I want is listed in a table, but there are multiple tables and no ID's. I then had the idea that I would find the header just above the table I was searching for and then use that as an indicator.
This has really troubled me, so as a last resort, I wanted to ask if there were someone who knows how to BeautifulSoup to find the table.
A snipped of the HTML code is provided beneath, thanks in advance :)
The table I am interested in, is the table right beneath <h2>Mine neaste vagter</h2>
<h2>Min aktuelle vagt</h2>
<div>
<a href='/shifts/detail/595212/'>Flere detaljer</a>
<p>Vagt starter: <b>11/06 2021 - 07:00</b></p>
<p>Vagt slutter: <b>11/06 2021 - 11:00</b></p>
<h2>Masker</h2>
<table class='list'>
<tr><th>Type</th><th>Fra</th><th> </th><th>Til</th></tr>
<tr>
<td>Fri egen regningD</td>
<td>07:00</td>
<td> - </td>
<td>11:00</td>
</tr>
</table>
</div>
<hr>
<h2>Mine neaste vagter</h2>
<table class='list'>
<tr>
<th class="alignleft">Dato</th>
<th class="alignleft">Rolle</th>
<th class="alignleft">Tidsrum</th>
<th></th>
<th class="alignleft">Bytte</th>
<th class="alignleft" colspan='2'></th>
</tr>
<tr class="rowA separator">
<td>
<h3>12/6</h3>
</td>
<td>Kundeservice</td>
<td>18:00 → 21:30 (3.5 t)</td>
<td style="max-width: 20em;"></td>
<td>
<a href="/shifts/ajax/popup/595390/" class="swap shiftpop">
Byt denne vagt
</a>
</td>
<td><a href="/shifts/detail/595390/">Detaljer</td>
<td>
</td>
</tr>
Here are two approaches to find the correct <table>:
Since the table you want is the last one in the HTML, you can use find_all() and using index slicing [-1] to find the last table:
print(soup.find_all("table", class_="list")[-1])
Find the h2 element by text, and the use the find_next() method to find the table:
print(soup.find(lambda tag: tag.name == "h2" and "Mine neaste vagter" in tag.text).find_next("table"))
You can use :-soup-contains (or just :contains) to target the <h2> by its text and then use find_next to move to the table:
from bs4 import BeautifulSoup as bs
html = '''your html'''
soup = bs(html, 'lxml')
soup.select_one('h2:-soup-contains("Mine neaste vagter")').find_next('table')
This is assuming the HTML, as shown, is returned by whatever access method you are using.
I have following HTML I want to crawl in Firefox using selenium. I want to get the value of the attribute "title" (= information I want to extract) as well as its text (=more information I want to extract). The element is located inside an iframe.
<tr id="id1f7" class="new" data-oao-mailid="tmai16837d54b5315319" data-folderid="tfol11c18fac000026ec">
<td class="slct first">
<span class="form-input form-input-type-checkbox">
<input type="checkbox" id="id1f8" name="maillist:rowsCheckGroup" value="check0"/>
</span>
</td>
<td class="mark">
<a class="mail-read-mark marked" data-oao-hover="toggleRead" title="Als gelesen markieren" data-title-read="Als gelesen markieren" data-title-unread="Als ungelesen markieren">
<svg class="mail-read-icon" xmlns="http://www.w3.org/2000/svg">
<use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#status-unread_24"></use>
</svg>
</a>
</td>
<td class="name" title="information I need to extract;">
more information I need to extract
</td>
<td class="text">
</td>
<td class="date">10:48 Uhr</td>
<td class="size" id="id1f9">63,17 KB</td>
</tr>
Here is the code that I'm using:
iframe = driver.find_elements_by_xpath("//div[#class='app-contents-wrapper']//iframe")[4]
driver.switch_to.frame(iframe)
time.sleep(3)
emails = driver.find_elements_by_xpath("//tr[contains('#data-oao-mailid',tmail) and ./td[#class='name']]")
for w in emails:
print(w.find_element_by_xpath(".//#title"))
However, when I execute the loop I keep getting the error "TypeError: node.ownerDocument is null". This is strange, since when I print out the emails-element,
for w in emails:
print(w)
I get results which look like this:
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="ad91d2a0-ed8f-8f44-9646-dad80647b266", element="243d171e-ad4c-d94b-bf8f-698daa971e0c")>
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="ad91d2a0-ed8f-8f44-9646-dad80647b266", element="040e26db-ecb1-a24c-af95-09f91d182e3c")>
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="ad91d2a0-ed8f-8f44-9646-dad80647b266", element="938c8595-e753-9141-926a-71d3fd81963d")>
So the code I use to select the "emails"-elements seems to work. However, when I try to iterate through those elements and get the "title"-attribute, it seems like they can't be selected by my xpath-selector.
try this:
for w in emails:
w.get_attribute("title")
this will print
information I need to extract;
How can I click “Actions“ option with Python+selenium? I have tried a lot of methods, please help me with some suggestions, thank you.
The following methods do not work:
driver.find_element_by_xpath("//*[#id="tabGroup_tabtable"]/tbody/tr/td[2]").click()
driver.find_element_by_css_selector("#tabGroup_tabtable > tbody > tr > td:nth-child(2)").click()
<table id="tabGroup_tabtable" class="tabGroup_tabtable">
<tbody>
<tr>
<td onclick="setFullHelpID(HelpLinks.EDITOR_COMPUTEROVERVIEW);tabGroupSetSelected(0);resize();" tabindex="0" onkeydown="if (event.keyCode==13||event.keyCode==32) {tabGroupSetSelected(0);resize();}" class="tab_selected">
<div class="tab_name">General</div>
</td>
<td onclick="setFullHelpID(HelpLinks.EDITOR_COMPUTEROVERVIEW_ACTIONS);tabGroupSetSelected(1);resize();" tabindex="0" onkeydown="if (event.keyCode==13||event.keyCode==32) {tabGroupSetSelected(1);resize();}" class="tab" onmouseover="this.className='tab_over';"
onmouseout="this.className='tab';">
<div class="tab_name">Actions</div>
</td>
<td onclick="setFullHelpID();tabGroupSetSelected(2);loadEvents();" tabindex="0" onkeydown="if (event.keyCode==13||event.keyCode==32) {tabGroupSetSelected(2);loadEvents();}" class="tab" onmouseover="this.className='tab_over';" onmouseout="this.className='tab';">
<div class="tab_name">System Events</div>
</td>
</tr>
</tbody>
</table>
Thank you, I have already solved it.
since my page is in the frame, I need to switch to the framework first using driver.switch_to.frame("input your frame name").
I have following html Code:
<tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1">
<td class="table-matches__tt"><span class="table-matches__time" data-live-cell="time">19:00</span><span>Oberneuland</span> - <span>Habenhauser</span></td>
<td class="livebet" data-live-cell="livebet"> </td>
<td class="table-matches__streams" data-live-cell="score">
</td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v">1.10</td>
<td class="table-matches__odds" data-oid="2p2k5xv498x0x0">7.44</td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0">12.40</td>
</tr>
I try to scrap from the following code the 3 float values: 1,10 7.44 12.40
The expression that i tried to use for geting the value was the following:
response.xpath('//a/#target').extract()
Output that I get is 'mySelections'.
Iwant to get the value next to it. What is the right expression for it?
Thank you in advance
What's wrong
response.xpath('//a/#target').extract()
Why?
If you format your HTML, the error is obvious.
You want to extract text from a tag, not the target attribute.
<tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1">
<td class="table-matches__tt">
<span class="table-matches__time" data-live-cell="time">19:00</span>
<a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink">
<span>Oberneuland</span> - <span>Habenhauser</span>
</a>
</td>
<td class="livebet" data-live-cell="livebet"> </td>
<td class="table-matches__streams" data-live-cell="score"></td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v">
<a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6ev9v&otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0"
onclick="return my_selections_click('1x2', 'soccer');"
title="Add to My Selections"
target="mySelections">1.10</a>
</td>
<td class="table-matches__odds" data-oid="2p2k5xv498x0x0">
<a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv498x0x0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0"
onclick="return my_selections_click('1x2', 'soccer');"
title="Add to My Selections"
target="mySelections">7.44</a>
</td>
<td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0">
<a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6eva0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0"
onclick="return my_selections_click('1x2', 'soccer');"
title="Add to My Selections"
target="mySelections">12.40</a>
</td>
</tr>
How to fix it
Use one of those followings
response.xpath('//a/text()').extract()
According to other developers, response.xpath sometimes will cause bugs, you should use scrapy's selector instead.
from scrapy.selector import Selector
result_array = Selector(text=response.body).xpath('//a/text()').extract()