Getting exact HTML of an xpath from selenium python driver - python

Basically I am an trying to get the html table of a page to save in database. Where I am getting the element by:
data = browser.find_element_by_xpath('/html/body/table[2]')
Here is the HTML that I am looking for:
<table class="general" cellspacing="2" cellpadding="3" border="0" width="550">
<tbody>
<tr class="generaltitle">
<td colspan="3">I am a basic title</td>
</tr>
<tr valign="top"><td colspan="3"><hr></td></tr>
</tbody></table>
But when I call this it gives me a raw result likt this:
"I am a basic title\n \n\n "
<br>
<hr>
I am calling it like this: data.get_attribute('innerHTML')
How to get the exact HTML?

Perhaps you were looking for data.get_attribute('outerHTML')

Related

How to extract the innerText of a <td> element with respect to the innerText of another <td> element

I am using selenium in python. I have come across this table webelement. I need to check if a string is present in the webelement and return a corresponding string in case its present.
<table width="700px" class="tableListGrid">
<thead>
<tr class="tableInfoTrBox">
<th>Date</th>
<th>Task Code</th>
<!-- th>Phone Number</th -->
<th>Fota Job</th>
<th colspan="2" class="thLineEnd">Task Description</th>
</tr>
</thead>
<tbody>
<tr class="tableTr_r">
<td>2018-04-06 05:48:29</td>
<td>FU</td>
<!-- td></td -->
<td>
57220180406-JSA69596727
</td>
<td style="text-align:left;">
updated from [A730FXXU1ARAB/A730FOJM1ARAB/A730FXXU1ARAB] to [A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9]
</td>
<td>
<table class="btnTypeE">
<tr>
<td>
View
</td>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
I need to search for "A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9" in this element and return "57220180406-JSA69596727" which is present in same row at a different place in the web page. Is it possible to do in selenium ?
EDIT: Cleaned the code to only contain useful data.
It can be achieved by finding the element using the following Xpath:
//td[contains(., 'A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9')]/preceding-sibling::td[1]/a
Xpath can be read as
find td which contains "A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9". Then find the td
preceding the found td and move to the a tag
After this you can get text using selenium
driver.find_element(By.XPATH, '//td[contains(., 'A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9')]/preceding-sibling::td[1]/a').text
To look out for a text e.g.A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9 and find out an associated text e.g. 57220180406-JSA69596727, you can write a function as follows :
def test_me(myString):
myText = driver.find_element_by_xpath("//table[#class='tableListGrid']//tbody/tr[#class='tableTr_r']//td[.='" + myString + "']//preceding::td[1]/a").get_attribute("innerHTML")
Now, from your main()/#Test you can call the function with the desired text as follows :
test_me("A730FXXU2ARC9/A730FOJM2ARC1/A730FXXU2ARC9")

Selenium Python find text on page and click next ahref link

I am trying to use selenium to find the text 'APPELLANT'S BRIEF FILED" and then have selenium click the very next ahref link. Below is the table class code on the page and the relevant td align code that I am focused on.
<table class="gridview" cellspacing="0" align="Center"
id="SheetContentPlaceHolder_caseDocket_gvDocketInformation"
style="border-collapse:collapse;">
<tbody><tr class="gridview_header">
This is the code I am focused on.
<tr style="background-color:Gainsboro;">
<td align="left" valign="top" style="width:75px;">04/10/2015</td>
<td align="left" valign="top">A1</td>
<td align="left" valign="top">EV</td>
<td align="left">**APPELLANT'S BRIEF FILED**. APPELLANT'S BRIEF</td>
<td align="center">
<a href="**DisplayImageList.aspx?q=IXEpMLEtUn6VTtFyd8FAyx5-hPNZuKfx0**"
target="_blank"><img src="images/ImageSheet.png" alt=""></a>
</td>
</tr>
Try this xpath //td[contains(., "APPELLANT'S BRIEF FILED")]/following-sibling::td[1]/a
driver.find_element_by_xpath("""//td[contains(., "APPELLANT'S BRIEF FILED")]/following-sibling::td[1]/a""")
To find any text within the table e.g. APPELLANT'S BRIEF FILED and then have invoke click() on the very next href link you can write a function which will accept the desired text as an input and click on the next href as follows :
def test_me(string):
driver.find_elements_by_xpath("//td[.='" + myString + "']//following::td[1]/a").click()
Now you can call the function test_me() from anywhere within your program with any of the text item from the table to click on the relevant href as follows :
test_me("APPELLANT'S BRIEF FILED")

scrapy xpath return empty data from table

i try get href from this table
<div class="squad-container">
<table class="table squad sortable" id="page_team_1_block_team_squad_8-table">
<thead>
<tr class="group-head">
<th colspan="4">Goalkeepers </th>
</tr>
</thead>
<tbody>
<tr>
<td style="width:50px;">Reda Sayed</td>
<td style="vertical-align: top;">
<div><a href="/474798/" >Reda Sayed</a></div>
<div style="padding-left: 27px;">25 years old</div>
</td>
</tr>
</tbody>
i use
response.xpath('//table[#class="table squad sortable"]//tr//td//a/#href').extract_first()
and didnt work with i need know what is the problem in code and what is different if i use double // or single slash
I don't think there is any problem with your xpath from we human's perspective. However, the xpath or css can be different from your spider's perspective, i.e. your spider may 'see' page differently.
Try using 'scrapy shell' to test your xpath or css and see if any data can be extracted. Here is the link to the doc in case you need: https://doc.scrapy.org/en/latest/topics/shell.html
To sum up: modify the xpath you wrote, 'cause your spider won't find any data with that xpath, and scrapy shell can help you.:)

Get td class text with selenium

So I want to take the text of td class.
The html page
<table class="table table-striped">
<tbody>
<tr>
<td class="text-center">
<img .....>
</td>
<td>text</td>
<td>text</td>
<td class="text-center">
<a ....></a>
</td>
<td class="text-center">
TEXT I WANT TO TAKE HERE
</td>
<td class="text-center">
<a ....><i class="fa fa-times"></i></a>
</td>
</tr>
</tbody>
</table>
The text I want to take is "TEXT I WANT TO TAKE HERE".
I tried using the xpath like below but it didnt work
table = browser.find_element_by_xpath(("//div[#class='table table-striped']/tbody/tr/td[5]"));
I got an error saying:
no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='table table-striped']/tbody/tr/td[5]"}
Is it because I have multiple classes in the selector and I have to use dot?
(I tried: 'table.table-striped' but it still didnt work)
Your xpath is incorrect. You have a table tag but, you are looking for a div tag. So, you just need to replace div with table.
table = browser.find_element_by_xpath(("//table[#class='table table-striped']/tbody/tr/td[5]"));
Use below xpath to get the text
browser.find_element_by_xpath("//td[#class='text-center']").text
And use the index as well to better find your row e.g.
browser.find_element_by_xpath("//td[#class='text-center'][3]").text
Use Below xpath to get the text TEXT I WANT TO TAKE HERE
//table//tr/td[contains(text(), 'TEXT I WANT TO TAKE HERE')]
Updated Answer: You can refer any of these below mentioned xpath to get your webelement.
//td[5]
OR
//table[#class='table table-striped']//td[5]
OR
//table[#class='table table-striped']/..//following-sibling::td[5]
OR
//td[#class='text-center'][3]
In your XPath expression you are looking for a div tag, but your HTML does not have that. Perhaps you are looking to the table tag:
table = browser.find_element_by_xpath(("//table[#class='table table-striped']/tbody/tr/td[5]"));

Parsing an HTML file with selectorgadget.com

How can I use beautiful soup and selectorgadget to scrape a website. For example I have a website - (a newegg product) and I would like my script to return all of the specifications of that product (click on SPECIFICATIONS) by this I mean - Intel, Desktop, ......, 2.4GHz, 1066Mhz, ...... , 3 years limited.
After using selectorgadget I get the string-
.desc
How do I use this?
Thanks :)
Inspecting the page, I can see that the specifications are placed in a div with the ID pcraSpecs:
<div id="pcraSpecs">
<script type="text/javascript">...</script>
<TABLE cellpadding="0" cellspacing="0" class="specification">
<TR>
<TD colspan="2" class="title">Model</TD>
</TR>
<TR>
<TD class="name">Brand</TD>
<TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Intel'));</script></TD>
</TR>
<TR>
<TD class="name">Processors Type</TD>
<TD class="desc"><script type="text/javascript">document.write(neg_specification_newline('Desktop'));</script></TD>
</TR>
...
</TABLE>
</div>
desc is the class of the table cells.
What you want to do is to extract the contents of this table.
soup.find(id="pcraSpecs").findAll("td") should get you started.
Have you tried using Feedity - http://feedity.com for creating a custom RSS feed from any webpage.

Categories