So I have started messing around with Selenium in python and I am just not able to figure this xpath out.
<div title class="popupBox" style="left: 0px; to p: 490px;">
<svg class="cp Drag" viewbox="0 0 24 24" style="height: 10px; width: 10px;">...</svg>
<div title class="eqBox">
<div title class="ib cp">
<img title src="src.svg">
<div title class="bred"></div>
<div title class="ds2">+1</div>
<div title class="ds3 IQ">TOP</div>
</div>
</div>
I need to be able to locate the TOP and get the result of TOP.
Hopefully that makes sense and I provided what is needed.
Try:
driver.find_element(By.XPATH, '//*[#class="ds3 IQ"]').text.splitlines[-1]
Related
I will try to explain this best i can. I am trying to automatically enter data onto a web page. This is the basic representation of it:
<html>
<head>
<body>
<div id="container">
<div id="'header"></div>
<div id="content"></div>
<div id="footer"></div>
</div>
<div id="control_window_11" class="simple_window" style="margin: 0px; position: absolute; z-index: 10008; display: none;"><div class="modalReportWindowClose"></div></div>
<iframe undefined="position:absolute;filter:progid:DXImageTransform.Microsoft.Alpha(opacity=0);display:none" src="javascript:void(0);" frameborder="0" style="position: absolute; display: none;"></iframe>
</body>
</head>
</html>
`<div id="control_window_10"></div> and <iframe></iframe>`
are both generated dynamically when i click save. After saving, the popup will go away and <div id="control_window_10" will be changed too: <div id="control_window_11".
the issue appears after the first successful iteration of the program. When it clicks 'save' and id=control_window changes from 10 to 11 (keep in mind it will increase every iteration contril_window_12, 13, 14 and so on.). although every single iframe and div are generated the same, selenium will still be looking at control_window_10 and not be able to find elements. i have tried to index them by Xpath, id, class, ive even tried to dynamically change the numbers myself.
i have a website made of a table. Each row has two a link elements.
Clicking on the second one it triggers a javascript which opens a menu.
With selenium i should navigate that menu till the "Properties" voice and to its relative submenu voice "Versions".
i tried with a simple find_element by Link_text, but it won't find the item
driver.find_element(by=By.LINK_TEXT, value="Properties").click()
i also tried with a try block but still won't find my element
try:
myElem = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.LINK_TEXT, "Properties")))
print("Found")
except TimeoutException:
print("Not Found!")
how can i access that element?
I need to select the menu voice "Properties", this way will be showed a submenu with the voice "Versions" which is the one i need to click.
How to do that?
here i attach a sample of the html code of the page
<td class="browseItemName" scope="rowgroup" nowrap="" width="100%">
<a class="mylink1" href="/test/">test link</a>
<div class="functionMenuDiv" style="left: 359px; top: 516px; z-index: 5; visibility: visible;">
<span class="functionMenu">
<div
class="menuItem"
onclick="javascript:popup_callback( escape( '/testlink/test12' ),'');return false;"
onmouseover="javascript:hiLight( 'mymenu0' );doSubMenu( 'mymenu0', 'funMenu4124' );"
onmouseout="javascript:loLight( 'mymenu0' );"
width="100%"
>
Download
</div>
<div id="mymenu14" class="menuItem" onmouseover="javascript:hiLight( 'mymenu14' );doSubMenu( 'mymenu14', 'funMenu4124' );" width="100%">
<div class="menuItemWithSubmenu" onclick="javascript:doKeyboardSubMenu( 'mymenu14', 'funMenu4124', event );return false;">
Properties
</div>
</div>
<div class="lastItemHilite"><img src="/tessst/spacer.gif" alt="" width="1" height="1" /></div>
</span>
<span class="subMenu" id="mymenu14Sub">
<div
class="menuItem"
onclick="javascript:popup_callback( escape( '/testlink/viewType%3D1' ), '' );return false;"
onmouseover="javascript:hiLight( 'mymenu14.0' );doSubMenu( 'mymenu14.0', 'mymenu14Sub' );"
onmouseout="javascript:loLight( 'mymenu14.0' );"
width="100%"
>
General
</div>
<div
class="menuItem"
onclick="javascript:popup_callback( escape( '/testlink/3D1' ), '' );return false;"
onmouseover="javascript:hiLight( 'mymenu14.5' );doSubMenu( 'mymenu14.5', 'mymenu14Sub' );"
onmouseout="javascript:loLight( 'mymenu14.5' );"
width="100%"
>
Versions
</div>
<div class="lastItemHilite"><img src="/tessst/spacer.gif" alt="" width="1" height="1" /></div>
</span>
</div>
</td>
Try doing the following:
driver.find_element(By.XPATH, f"{xpath}")
In this case xpath would be the link's xpath. If your wondering how to get xpath then go right-click inspect the link element. Then the elements "source code" will come up on the right. Right-click that and click copy > copy full xpath.
I am very much new to Python and Scrapy, but when I tried to iterate nested html elements, it is not producing desired result.
Below is the HTML, i am trying to scrap.
<div class="level1" role="main">
<div class="level2">
<h1 id="fullStoreHeading" class="class_h1">Page Title</h1>
<div class="fsdColumn_3">
<div class='fsdDeptBox'>
<img alt="" src="" aria-hidden="true" height="100%" width="100%">
<h2 class="fsdDeptTitle">TV</h2>
<div class='fsdDeptCol'>
<a class="class_a" href="/test?_encoding=UTF8&id=1001">Samsung</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1002">Vizio</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1003">Element</a>
</div>
</div>
<div class='fsdDeptBox'>
<img alt="" src="" aria-hidden="true" height="100%" width="100%">
<h2 class="fsdDeptTitle">Laptop</h2>
<div class='fsdDeptCol'>
<a class="class_a" href="/test?_encoding=UTF8&id=1004">Apple</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1005">Microsoft</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1006">Dell</a>
</div>
</div>
</div>
<div class="fsdColumn_3">
<div class='fsdDeptBox'>
<img alt="" src="" aria-hidden="true" height="100%" width="100%">
<h2 class="fsdDeptTitle">Video Game Console</h2>
<div class='fsdDeptCol'>
<a class="class_a" href="/test?_encoding=UTF8&id=1007">Xbox One</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1008">Xbox 360</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1009">PS 5</a>
</div>
</div>
<div class='fsdDeptBox'>
<img alt="" src="" aria-hidden="true" height="100%" width="100%">
<h2 class="fsdDeptTitle">SSD</h2>
<div class='fsdDeptCol'>
<a class="class_a" href="/test?_encoding=UTF8&id=1010">Samsung Evo</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1011">Crucial</a>
<a class="class_a" href="/test?_encoding=UTF8&id=1012">Sandisk</a>
</div>
</div>
</div>
</div>
The output I am trying to generate from the above html is a list of:
Product Category -> Brand -> Id
E.g.
TV
Samsung 1001
Vizio 1002
Element 1003
Laptop
Apple 1004
Microsoft 1005
Dell 1006
Video Game Console
Xbox Onen 1007
Xbox 360 1008
PS4 1009
ProductCategories.py
def parse(self, response):
l = ItemLoader(item=ProductSpiderItem(), response=response)
titles = response.xpath('//*[#class="fsdDeptTitle"]')
for title in titles:
Product_Category= title.xpath('text()').extract()
l.add_value('Product_Category', Product_Category)
for brnd in
title.xpath('//*[#class="fsdDeptCol"]/a[#class="class_a"]'):
Brand = brnd.xpath('text()').extract()
l.add_value('Brand', Brand)
return l.load_item()
At this moment it is printing all the product categories from "Outer For Loop" once and the "Inner For Loop" is printing all the brands irrespective of the product categories and the "Inner For Loop" prints all the brands whenever the "Outer For Loop" runs.
I would really appreciate any help to resolve the issue.
Thanks a lot.
Your first 'for' loop sends it to iterate through the <h2 class="fsdDeptTitle">SSD</h2> part of the HTML. Then what you're trying to do is look within that code to find class=class_a. It can't do that because the first 'for' loop is too specific to also select the HTML where 'class_a' is.
You can fix this by having your 'for' loops look one level higher in the HTML.
titles = response.xpath("//*[#class='fsdDeptBox']")
for title in titles:
Product_Category=title.xpath('text()').extract()
l.add_value('Product_Category', Product_Category)
for brnd in title.xpath('div[#class="fsdDeptCol"]'):
Brand = brnd.xpath('*/text()').extract()
l.add_value('Brand', Brand)
return l.Load_item()
I changed the first 'for' loop to select enough of the HTML to include a path to the 'class_a' text
Side note. I don't know much about the correct HTML terms but I hope this still made sense.
I think you should check a bit more how ItemLoaders work. They also depend on how your items and item loaders are defined, for example let's assume you've defined like this:
class ProductItem(Item):
category = Field()
brand = Field()
class ProductItemLoader(ItemLoader):
default_item_class = ProductItem
default_output_processor = TakeFirst()
then you could do something like this:
for product in response.css('.fsdDeptCol a'):
il = ProductItemLoader(selector=product)
il.add_xpath('category', './ancestor::*/preceding-sibling::h2/text()')
il.add_xpath('brand', './text()')
yield il.load_item()
would appreciate if you can point me in the right direction. Is there a better way of doing this and capture all the data (with html tags class "Document Text")) ...
If i do like this. I missing some tags in the end orginal html string is 20K in size(so its lot of data).
soup = BeautifulSoup(r.content, 'html5lib')
c.case_html = str(soup.find('div', class_='DocumentText')
print(self.case_html)
Following is the code for scraping which works fine for now but the second new tag is added it is broken.
soup = BeautifulSoup(r.content, 'html5lib')
c.case_html = str(soup.find('div', class_='DocumentText').find_all(['p','center','small']))
print(self.case_html)
Sample html is as follows original is around the 20K string size
<form name="form1" id="form1">
<div id="theDocument" class="DocumentText" style="position: relative; float: left; overflow: scroll; height: 739px;">
<p>PTag</p>
<p> <center> First center </center> </p>
<small> this is small</small>
<p>...</p>
<p> <center> Second Center </center> </p>
<p>....</p>
</div>
</form>
Expected output to be this
<div id="theDocument" class="DocumentText" style="position: relative; float: left; overflow: scroll; height: 739px;">
<p>PTag</p>
<p> <center> First center </center> </p>
<small> this is small</small>
<p>...</p>
<p> <center> Second Center </center> </p>
<p>....</p>
</div>
You can try this. I just based my answer on your given html code. If you need clarifications, just let me know. Thanks!
soup = BeautifulSoup(r.content, 'html5lib')
case_html = soup.select('div.DocumentText')
print(case_html.get_text())
I have been trying to automate few action on a web based tool using selenium with python as scripting language.
So in this page, I have a field called status, here is the HTML code for this.
<div style="padding-left:105px" id="x-form-el-BugFieldsEditor_Status-desc" class="x-form-element x-form-el-BugFieldsEditor_Status-desc" role="presentation">
<div class=" x-form-field-wrap x-component " role="combobox" id="BugFieldsEditor_Status-desc" style="width: 230px;">
<input type="text" class=" x-form-field x-form-text x-triggerfield-noedit" id="BugFieldsEditor_Status-desc-input" name="Status-desc" tabindex="1" readonly="" autocomplete="off" aria-owns="x-auto-462" aria-selected="" style="width: 205px;" aria-readonly="false" aria-invalid="false" aria-required="false">
<img class="x-form-trigger x-form-trigger-arrow" src="https://someurl/clear.gif" id="x-auto-463">
</div>
</div>
it actually appears as a compo box drop down list on the page. I am able to locate this element through find element by ID and able to simulate click event which brings up the list of option it contains. the HTML code of this is this.
<div role="presentation" id="x-auto-881" class="x-combo-list x-ignore x-component x-border " style="border-width: 1px; z-index: 1060; visibility: visible; height: 273px; width: 228px; left: 277px; top: 75px;">
<div tabindex="0" hidefocus="true" id="x-auto-462" class=" x-view x-combo-list-inner x-component x-unselectable " style="overflow-x: hidden; padding: 0px; border-width: 0px; height: 273px; width: 228px;" unselectable="on">
<div class="x-combo-list-item " role="listitem">A-Assigned</div>
<div class="x-combo-list-item " role="listitem">C-Closed</div>
<div class="x-combo-list-item " role="listitem">D-Duplicate</div>
<div class="x-combo-list-item " role="listitem">F-Forwarded</div>
<div class="x-combo-list-item " role="listitem">H-Held</div>
<div class="x-combo-list-item " role="listitem">I-Info_req</div>
<div class="x-combo-list-item " role="listitem">J-Junked</div>
<div class="x-combo-list-item " role="listitem">M-More</div>
<div class="x-combo-list-item " role="listitem">O-Opened</div>
<div class="x-combo-list-item x-view-highlightrow x-combo-selected" role="listitem">P-Postponed</div>
<div class="x-combo-list-item " role="listitem">R-Resolved</div>
<div class="x-combo-list-item " role="listitem">U-Unreproducible</div>
<div class="x-combo-list-item" role="listitem">W-Wait</div>
</div>
</div>
Now the issue, on simulating click, though the list of option is displayed but I am not able to locate this element and select any option.
Can someone please help.
You should be able to get it with an xpath selector like this:
driver.find_element(By.XPATH, '//div[text()="A-Assigned"]')
You may need to make the selector more specific depending on the rest of your HTML.