I am trying to search into a table for a specific value (Document ID) and then press a button that is next to that column (Retire). I should add here that the 'Retire' button is only visible once the mouse is hovered over, but I have built that into my code which I'll share further down.
So for example:
My Document ID would be 0900766b8001b6a3, and I would want to click the button called 'Retire'. The issue I'm having is pulling the XPaths for the retire buttons, this needs to be dynamic. I got it working for some Document IDs that had a common link, for example:
A700000007201082 Xpath = //[#id="retire-7201082"] (you can see the commonality here with the Document ID ending in the same as the Xpath number 7201082. Whereas in the first example, the xpath for '0900766b8001b6a3' = //[#id="retire-251642"], you can see the retire number here is completely random to the Document ID and therefore hard to manually build the Xpath.
Here is my code:
before_XPath = "//*[#class='wp-list-table widefat fixed striped table-view-list pages']/tbody/tr["
aftertd_XPath_1 = "]/td[1]"
aftertd_XPath_2 = "]/td[2]"
aftertd_XPath_3 = "]/td[3]"
before_XPath_1 = "//*[#class='wp-list-table widefat fixed striped table-view-list pages']/tbody/tr[1]/th["
before_XPath_2 = "//*[#class='wp-list-table widefat fixed striped table-view-list pages']/tbody/tr[2]/td["
aftertd_XPath = "]/td["
after_XPath = "]"
aftertr_XPath = "]"
search_text = "0900766b8001af05"
time.sleep(10)
num_rows = len(driver.find_elements_by_xpath("//*[#class='wp-list-table widefat fixed striped table-view-list pages']/tbody/tr"))
num_columns = len (driver.find_elements_by_xpath("//*[#class='wp-list-table widefat fixed striped table-view-list pages']/tbody/tr[2]/td"))
elem_found = False
for t_row in range(2, (num_rows + 1)):
for t_column in range(1, (num_columns + 1)):
FinalXPath = before_XPath + str(t_row) + aftertd_XPath + str(t_column) + aftertr_XPath
cell_text = driver.find_element_by_xpath(FinalXPath).text
if ((cell_text.casefold()) == (search_text.casefold())):
print("Search Text "+ search_text +" is present at row " + str(t_row) + " and column " + str(t_column))
elem_found = True
achains = ActionChains(driver)
move_to = driver.find_element_by_xpath("/html/body/div[1]/div[2]/div[2]/div[1]/div[3]/form[1]/table/tbody/tr[" + str(t_row) + "]/td[2]")
achains.move_to_element(move_to).perform()
retire_xpath = driver.find_element_by_xpath("//*[#id='retire-"+ str(search_text[-7:])+"']")
time.sleep(6)
driver.execute_script("arguments[0].click();", move_to)
time.sleep(6)
driver.switch_to.alert.accept()
break
if (elem_found == False):
print("Search Text "+ search_text +" not found")
This particular bit of code lets me handle any Document IDs such as 'A700000007201082' as I can just cut off the part I need and build it into an XPath:
retire_xpath = driver.find_element_by_xpath("//*[#id='retire-"+ str(search_text[-7:])+"']")
I've tried to replicate the above for the Doc IDs starting with 09007, but I can't find how to pull that unique number as it isn't anywhere accessible in the table.
I am wondering if there's something I can do to build it the same way I have above or perhaps focus on the index? Any advice is much appreciated, thanks.
EDIT:
This is the HTML code for the RETIRE button for Document ID: 0900766b8001b6a3
<span class="retire"><button id="retire-251642" data-document-id="251642" rel="bookmark" aria-label="Retire this document" class="rs-retire-link">Retire</button></span>
You can see the retire button id is completely different to the Document ID. Here is some HTML code just above it which I think could be useful:
<div class="hidden" id="inline_251642">
<div class="post_title">General Purpose Test Kit Lead</div><div class="post_name">0900766b8001b6a3</div>
<div class="post_author">4</div>
<div class="comment_status">closed</div>
<div class="ping_status">closed</div>
<div class="_status">publish</div>
<div class="jj">30</div>
<div class="mm">03</div>
<div class="aa">2001</div>
<div class="hh">15</div>
<div class="mn">43</div>
<div class="ss">03</div>
<div class="post_password"></div><div class="post_parent">0</div><div class="page_template">default</div><div class="tags_input" id="rs-language-code_251642">de, en, fr, it</div><div class="tags_input" id="rs-current-state_251642">public</div><div class="tags_input" id="rs-doc-class-code_251642">rs_instruction_sheet</div><div class="tags_input" id="rs-restricted-countries_251642"></div></div>
Would it be possible to call the div class "post_name" as this has the correct doc ID, and the press the RETIRE button for that specific Doc ID?
Thank you.
Related
Hey all trust that you're well, I'm trying to find elements by class_name and loop through them, however, they all have the same class_name.
I've discovered that they contain different index numbers and I'm trying to utilise that to loop through them
Example of the element and the index:
<div class="member-2gU6Ar container-1oeRFJ clickable-28SzVr" aria-controls="popout_4188" aria-expanded="false" tabindex="-1" colorroleid="987314373729067059" index="0" role="listitem" data-list-item-id="members-987320208253394947___0">
<div class="member-2gU6Ar container-1oeRFJ clickable-28SzVr" aria-controls="popout_4184" aria-expanded="false" tabindex="-1" colorroleid="987324577870929940" index="1" role="listitem" data-list-item-id="members-987320208253394947___1">
My code:
users = bot.find_elements(By.CLASS_NAME, 'member-2gU6Ar')
time.sleep(5)
try:
for user in users:
user.click()
message = bot.find_element(By.XPATH, '//body[1]/div[1]/div[2]/div[1]/div[3]/div[1]/div[1]/div[1]/div[5]/div[1]/input[1]')
time.sleep(5)
message.send_keys('Automated' + Keys.ENTER)
except NoSuchElementException:
skip
The class that you see over here member-2gU6Ar container-1oeRFJ clickable-28SzVr is not a single class, it is a combination of multiple classes separated with space.
So using member-2gU6Ar would not work as expected.
You can remove the spaces and put a . to make a CSS selector though.
div.member-2gU6Ar.container-1oeRFJ.clickable-28SzVr
I would not really suggest that since I see it contains alpha numeric string, that may get change with the time.
Here I have written an xpath:
//div[starts-with(#class,'member') and contains(#class, 'container') and #index]
this should match all the divs with the specified attribute.
You can use it probably like this:
users = bot.find_elements(By.XPATH, "//div[starts-with(#class,'member') and contains(#class, 'container') and #index]")
i = 1
time.sleep(5)
try:
for user in users:
ele = bot.find_element(By.XPATH, f"//div[starts-with(#class,'member') and contains(#class, 'container') and #index= '{i}']")
ele.click()
message = bot.find_element(By.XPATH, '//body[1]/div[1]/div[2]/div[1]/div[3]/div[1]/div[1]/div[1]/div[5]/div[1]/input[1]')
time.sleep(5)
message.send_keys('Automated' + Keys.ENTER)
i = i + 1
except NoSuchElementException:
skip
However I would recommend you to use a relative xpath and not absolute xpath //body[1]/div[1]/div[2]/div[1]/div[3]/div[1]/div[1]/div[1]/div[5]/div[1]/input[1].
Hope this helps.
I am trying to automate adding items to cart in online shop, however, I got stuck on a loop that should differentiate whether item is available or not.
Here's the loop:
while True:
#if ???
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='" + size.get() + "']"))).click()
sleep(1)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Add to cart']"))).click()
sleep(1)
print("Success!")
break
else:
driver.refresh()
sleep(3)
If the size is available, button is active:
<div class="styles__ArticleSizeItemWrapper-sc-dt4c4z-4 eQqdpu">
<button aria-checked="false" role="radio" class="styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs">
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 styles__ArticleSizeItemTitle-sc-1n1fwgw-3 gnSCRf cLhSqA bipwfD">XL</span>
<span class="styles__StyledText-sc-cia9rt-0 ffGzxX">
</span>
</button>
</div>
If not, button is inactive:
<div class="styles__ArticleSizeItemWrapper-sc-dt4c4z-4 eQqdpu">
<button disabled="" aria-checked="false" role="radio" class="styles__ArticleSizeButton-sc-1n1fwgw-0 fBeTLI">
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 styles__ArticleSizeItemTitle-sc-1n1fwgw-3 gnSCRf cLhSqA bipwfD">XXL</span>
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 kQJTJc cLhSqA">
</span>
</button>
</div>
The question is: what should be the condition for this loop?
I have tried something like this:
if (driver.find_elements(By.XPATH, "//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and text()='" + e2.get() + "']")):
EDIT: Replaced "=" with "," in the above code as follows:
if (driver.find_elements(By.XPATH, "//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and text()='" + e2.get() + "']")):
but I keep getting invalid xpath expression error.
EDIT: The error is gone, but the browser keeps refreshing with the else statement (element not found).
I believe your error is in the use of the contains function, which expects two parameters: a string and a substring, although you're passing it a boolean expression (#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs').
I expect this is just a typo and you actually meant to type contains(#class, 'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') (NB comma instead of an equals sign after #class).
Also, you are looking for a button element which has a child text node (text() refers to a text node) which is equal to the size you're looking for, but that text node is actually a child of a span which is a child of the button. You can compare your size to the text value of that span.
Try something like this:
"//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and span='"
+ e2.get()
+ "']"
e3="Some value"
x=f"//button[contains(#class,'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and not(contains(#disabled='')) and ./span[contains(text(),'{e3}')]])]"
print(x)
Try looking for the button which contains that class and with that span and maybe check if button disabled?
I managed to get it working using this condition:
if (driver.find_elements(By.XPATH,
"//*[contains(#class, 'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs')
and .//*[text()='" + e2.get() + "']]")):
It is quite similar to the original approach, however, adding .//* before text() did the trick.
Without .//* find_elements was looking in the same node which resulted in not finding the element. .//* instructs find_elements to look in the child node where element exists.
Important: text condition was wrapped in additional [] brackets.
I'm trying to get name and contact number from div and div has three span, but the problem is that sometime div has only one span, some time two and sometime three span.
First span has name.
Second span has other data.
Third span has contact number
Here is HTML
<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com"
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)
793-8336</span>
</div>
Here is my Code
try:
name= browser.find_element_by_xpath("//span[#class='listing-field'][1]")
name = name.text.strip()
print("name : " + name)
except:
print("Name are missing")
name = "N/A"
try:
contact_info= browser.find_element_by_xpath("//span[#class='listing-
field'][3]")
contact_info = contact_info.text.strip()
print("contact info : " + contact_info)
except:
print("contact_info are missing")
days = "N/A"
My code is not giving me correct result. Can anyone provide me best possible solution. Thanks
You can iterate throw contacts and check, if there's child a element and if match phone number pattern:
contacts = browser.find_elements_by_css_selector("span.listing-field")
contact_name = []
contact_phone = "N/A"
contact_web = "N/A"
for i in range(0, len(contacts)):
if len(contacts[i].find_elements_by_tag_name("a")) > 0:
contact_web = contacts[i].find_element_by_tag_name("a").get_attribute("href")
elif re.search("\\(\\d+\\)\\s+\\d+-\\d+", contacts[i].text):
contact_phone = contacts[i].text
else:
contact_name.append(contacts[i].text)
contact_name = ", ".join(contact_name) if len(contact_name) > 0 else "N/A"
Output:
contact_name: ['Kevin Howard', 'Howard enterprise']
contact_phone: '(206) 334-8414'
The page has captcha. To scrape better to use requests, all information provided in json format.
#sudharsan
# April 07 2019
from bs4 import BeautifulSoup
text ='''<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com"
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)
793-8336</span>
</div>'''
# the given sample html is stored as a input in variable called "text"
soup = BeautifulSoup(text,"html.parser")
main = soup.find(class_="listing-field")
# Now the spans with class name "listing-field" is stored as list in "main"
print main[0].text
# it will print the first span element
print main[-1].text
# it will print the last span element
#Thank you
# if you like the code "Vote for it"
<div class="inner-article">
<a style="height:150px;" href="this is a link"><img width="150" height="150" src="this is an image" alt="K1 88ahiwyu"></a>
<h1><a class="name-link" href="/shop/jackets/pegroxdya/dao7kdzej">title</a></h1>
<p><a class="name-link" href="/shop/jackets/pegroxdya/dao7kdzej">subtitle</a></p>
</div>
Hello!
I need to find a XPath to get the "div" with the class="inner-article" by Title and Subtitle of the two "a" children. The website I want to operate on has a lot of these inner-articles and I need to find a specific one, only given a title and a subtitle.
E.G.: The website has an inner article with the title "Company® Leather Work Jacket" and a subtitle with its color "Silver".
Now I need to be able to find the "div" element even if I only have the keywords "Work Jacket" for the title and "Silver" for the subtitle.
This is what I came up with already:
e1 = driver.find_element_by_xpath("//*[text()[contains(.,'" + kw + "')]]")
kw is a string which contains the keywords for the title and if I print it out it correctly responds the "a" element and clicking on it works too, but it's not specific enough because there are more objects which also have these keywords in their title, which is why I also need the subtitle that always contains the color(Here referred to as string "clr"):
e2 = driver.find_element_by_xpath("//*[text()[contains(.,'" + clr + "')]]")
This also works and clicks on the subtitle correctly but only the color would also return multiple objects on the website.
That's why I need to find the "div" element with keywords for the title and the color for the subtitle.
I've tried this but it doesn't work:
e1 = driver.find_element_by_xpath("//*[text()[contains(.,'" + kw + "') and contains(.,'" + clr + "')]]")
Try
driver.find_element_by_xpath("//div[h1/a[contains(text(),'" + kw + "')] and p/a[contains(text(),'" + clr + "')]]")
You can learn more xpath grammar.Refer to this link
In your case you can use the xpath like this.
("//*[text()[contains(.,'" + kw + "')]]/parent::div[1]")
I'm having a div table where each row has two cells/columns.
The second cell/column sometimes has a clear text (<div class="something">Text</div>) while sometimes it's hidden within an "a" tag inside: <div class="something">Text</div>.
Now, I have no problem in getting everything but the linked text. I can also get the linked text separately, but I don't know how to get everything at once, so I get three columns of data:
1. first column text,
2. second column text no matter if it is linked or not,
3. link, if it exist
The code that extracts everything not linked and works is:
times = scrapy.Selector(response).xpath('//div[contains(concat(" ", normalize-space(#class), " "), " time ")]/text()').extract()
titles = scrapy.Selector(response).xpath('//div[contains(concat(" ", normalize-space(#class), " "), " name ")]/text()').extract()
for time, title in zip(times, titles):
print time.strip(), title.strip()
I can get the linked items only with
ltitles = scrapy.Selector(response).xpath('//div[contains(concat(" ", normalize-space(#class), " "), " name ")]/a/text()').extract()
for ltitle in ltitles:
print ltitle.strip()
But don't know how to combine the "query" to get everything together.
Here's a sample HTML:
<div class="programRow rowOdd">
<div class="time ColorVesti">
22:55
</div>
<div class="name">
Dnevnik
</div>
</div>
<div class="programRow rowEven">
<div class="time ColorOstalo">
23:15
</div>
<div class="name">
<a class="recnik" href="/page/tv/sr/story/20/rts-1/2434373/kulturni-dnevnik.html" rel="/ajax/storyToolTip.jsp?id=2434373">Kulturni dnevnik</a>
</div>
</div>
Sample output (one I cannot get):
22:55, Dnevnik, []
23:15, Kulturni dnevnik, /page/tv/sr/story/20/rts-1/2434373/kulturni-dnevnik.html
I either get the first two columns (without the linked text) or just the linked text with the code samples above.
If I understand you correctly then you should probably just iterate through program nodes and create item for every cycle. Also there's xpath shortcut //text() which captures all text under the node and it's childrem
Try something like:
programs = response.xpath("//div[contains(#class,'programRow')]")
for program in programs:
item = dict()
item['name'] = program.xpath(".//div[contains(#class,'name')]//text()").extract_first()
item['link'] = program.xpath(".//div[contains(#class,'name')]/a/#href").extract_first()
item['title'] = program.xpath(".//div[contains(#class,'title')]//text()").extract_first()
return item