I'm trying to create a function that locates any add to cart button on a given website by searching for the text on the button. For example, Amazon says "Add To Cart" so I am using this function to try to locate the button. Unfortunately I'm getting:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[.='Add To Cart']"}
def GetElementByText(driver, url, text):
driver.get(url)
element = driver.find_element_by_xpath("//div[.='" + text + "']")
print(element)
return element
element = GetElementByText(driver, 'https://www.amazon.com/gp/product/B00ZG9U0KA?pf_rd_r=AQC5SP1PPERA8C37YCC8&pf_rd_p=5ae2c7f8-e0c6-4f35-9071-dc3240e894a8', 'Add To Cart')
I've also tried using this function, which works on other websites but not on Amazon.
def GetButtons(driver, url):
driver.get(url)
html = driver.page_source
driver.quit()
soup = BeautifulSoup(html, 'html.parser')
buttons = soup.find_all('button')
return buttons
GetButtons(driver, 'https://www.amazon.com/gp/product/B00ZG9U0KA?pf_rd_r=AQC5SP1PPERA8C37YCC8&pf_rd_p=5ae2c7f8-e0c6-4f35-9071-dc3240e894a8')
Is there an easier way to accomplish this in a dynamic way that would be easy to apply to other websites? My concern is that some websites have buttons, and some have links. Returning all the links or tags using BeautifulSoup returns too many results to practically sort through.
Any ideas for how to accomplish this? The function wouldn't necessarily have to automatically find the button on its own (Though that would be great), but if I could narrow it down enough to search through 10-20 possible results that would be perfect.
Two things.
On Amazon, the text is "Add to Cart" with a lowercase "to," you have it as an uppercase "To."
By changing your xpath to "//*[.='" + text + "']", I was able to find it (after correcting the To/to error). It might be too broad for general application, but worth a try.
Related
I've been looking through questions and answers like this one: Selenium Unable to Locate Element by ID
Most elements appear inaccessible by ID or XPATH. I'm attempting to find and click the element that has the text "Add Parent":
I've tried things like:
browser.find_element(By.XPATH, "/html/body/div/main/fs-person-page//iron-pages/fs-tree-person-details//div/section/fs-tree-person-family//fs-tree-collapsable-card/fs-family-members//div[2]/section[2]/div/button[2]/span/fs-inner-html")
and
browser.find_element(By.XPATH, "//fs-inner-html[text() = 'Add Parent']")
(similarly, finding by ids and classes doesn't seem to be working)
They solve the solution of not being able to find an element by switching to the iframe in which the element resides. The web page in which I'm searching for elements doesn't have any iframes. Do I need to be switching to something else? How can I determine what frame-like element I should be switching to?
Thanks!
Anson
The HTML of the webpage that I'm trying to scrape can be found here.
The issue with the
browser.find_element(By.XPATH, "//fs-inner-html[text() = 'Add Parent']")
xpath is that the text probably doesn't exactly equal 'Add Parent'.
There may be white-space on either side or the inside span 'to Maria Danko' is causing it to fail.
Just as a sanity check could you seen if any of these work?
button.add-person //css selector
button.add-person[data-test-add-parent=''] //css selector
//fs-inner-html[contains(text(), 'Add Parent')] //xpath
Here is the inspect result for the button that says +5 per day
>span class="text user-links entry-method-title ng-scope ng-binding" ng-include="::'views/entry-text/'+entry_method.entry_type+'.html'">
Click For a Daily Bonus Entry"
</span>
<div class="entry-method bonus template" data-remove-popovers="" id="em6129519" ng-class="{expanded: entryState.expanded == entry_method, template: entry_method.template, 'completed-entry-method': !canEnter(entry_method) && isEntered(entry_method)}" ng-repeat="entry_method in ::entry_methods">
here is the HTML given information when I inspect the link/button, I have tried to use XPath, CSS, link text, and class name and it keeps giving me an error saying it cannot identify the element. Does anyone have a suggestion for how to identify this, it is on gleam.io for a giveaway I'm trying to automate this so i don't have to log in and press this everyday. This is my first ever web interfacing project with python.
Here is my most recent try
driver.maximize_window()
time.sleep(10)
driver.execute_script("window.scrollTo(0, 1440)")
time.sleep(10)
button2 = driver.find_element_by_class_name("text user-links entry-method-title ng-scope ng-binding")
button2.click()
Similar to a previous issue, Selenium find_element_by_class_name and find_element_by_css_selector not working, you can't have spaces in your class name when using driver.find_element_by_class_name. Instead, find the element via css_selector and replace each space with a dot.
driver.find_element_by_css_selector("span.text.user-links.entry-method-title.ng-scope.ng-binding")
That'll fix what you have above, but keep in mind there are other ways to make selenium actions more reliable (eg. WebDriverWait, etc). And there may be a cleaner selector to use than the one above.
I believe the element you want to access is contained within an "iframe", thus you must first switch to iframe before you can access it using selectors.
driver.switch_to.frame(x.find_element_by_xpath("PUT IFRAME XPATH HERE"))
Scrapping links should be a simple feat, usually just grabbing the src value of the a tag.
I recently came across this website (https://sunteccity.com.sg/promotions) where the href value of a tags of each item cannot be found, but the redirection still works. I'm trying to figure out a way to grab the items and their corresponding links. My typical python selenium code looks something as such
all_items = bot.find_elements_by_class_name('thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
However, I can't seem to retrieve any href, onclick attributes, and I'm wondering if this is even possible. I noticed that I couldn't do a right-click, open link in new tab as well.
Are there any ways around getting the links of all these items?
Edit: Are there any ways to retrieve all the links of the items on the pages?
i.e.
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
...
Edit:
Adding an image of one such anchor tag for better clarity:
By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js) that gives you a way to get all the links, which are based on the HappeningID. You can verify by running this in the JS console, which gives you the first promotion:
window.__NUXT__.state.Promotion.promotions[0].HappeningID
Based on that, you can create a Python loop to get all the promotions:
items = driver.execute_script("return window.__NUXT__.state.Promotion;")
for item in items["promotions"]:
base = "https://sunteccity.com.sg/promotions/"
happening_id = str(item["HappeningID"])
print(base + happening_id)
That generated the following output:
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
https://sunteccity.com.sg/promotions/764
https://sunteccity.com.sg/promotions/766
https://sunteccity.com.sg/promotions/762
https://sunteccity.com.sg/promotions/767
https://sunteccity.com.sg/promotions/732
https://sunteccity.com.sg/promotions/733
https://sunteccity.com.sg/promotions/735
https://sunteccity.com.sg/promotions/736
https://sunteccity.com.sg/promotions/737
https://sunteccity.com.sg/promotions/738
https://sunteccity.com.sg/promotions/739
https://sunteccity.com.sg/promotions/740
https://sunteccity.com.sg/promotions/741
https://sunteccity.com.sg/promotions/742
https://sunteccity.com.sg/promotions/743
https://sunteccity.com.sg/promotions/744
https://sunteccity.com.sg/promotions/745
https://sunteccity.com.sg/promotions/746
https://sunteccity.com.sg/promotions/747
https://sunteccity.com.sg/promotions/748
https://sunteccity.com.sg/promotions/749
https://sunteccity.com.sg/promotions/750
https://sunteccity.com.sg/promotions/753
https://sunteccity.com.sg/promotions/755
https://sunteccity.com.sg/promotions/756
https://sunteccity.com.sg/promotions/757
https://sunteccity.com.sg/promotions/758
https://sunteccity.com.sg/promotions/759
https://sunteccity.com.sg/promotions/760
https://sunteccity.com.sg/promotions/761
https://sunteccity.com.sg/promotions/763
https://sunteccity.com.sg/promotions/765
https://sunteccity.com.sg/promotions/730
https://sunteccity.com.sg/promotions/734
https://sunteccity.com.sg/promotions/623
You are using a wrong locator. It brings you a lot of irrelevant elements.
Instead of find_elements_by_class_name('thumb-img') please try find_elements_by_css_selector('.collections-page .thumb-img') so your code will be
all_items = bot.find_elements_by_css_selector('.collections-page .thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
You can also get the desired links directly by .collections-page .thumb-img a locator so that your code could be:
links = bot.find_elements_by_css_selector('.collections-page .thumb-img a')
for link in links:
print(link.get_attribute("href"))
I was attempting to solve this issue for a bit of time and attempted multiple solution posted on here prior to opening this question.
I am currently attempting to a run a scraper with the following code
website = 'https://www.abitareco.it/nuove-costruzioni-milano.html'
path = Path().joinpath('util', 'chromedriver')
driver = webdriver.Chrome(path)
driver.get(website)
main = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "p1")))
My goal hyperlink has word scheda in it:
i = driver.find_element_by_xpath('.//a[contains(#href, "scheda")]')
i.text
My first issue is that find_element_by_xpath only outputs a single hyperlink and second issue is that it is not extracting anything so far.
I'd appreciate any help and/or guidance.
You need to use find_elements instead :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.text)
Note that find_elements will return a list of web elements, where as find_element return a single web element.
if you specifically looking for href attribute then you can try the below code :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.get_attribute('href'))
There's 2 issues, looking at the website.
You want to find all elements, not just one, so you need to use find_elements, not find_element
The anchors actually don't have any text in them, so .text won't return anything.
Assuming what you want is to scrape the URLs of all these links, you can use .get_attribute('href') instead of .text, like so:
url_list = driver.find_elements(By.XPATH, './/a[contains(#href, "scheda")]')
for i in url_list:
print(i.get_attribute('href'))
It will detect all webelements that match you criteria and store them in a list. I just used print as an example, but obviously you may want to do more than just print the links.
a particular button (which allows me to jump to the second page) has a href
inputHref = /letsdeal?sectionLoadingID=m_timeline_loading_div_1485935999_0_36_timeline_unit%3A1%3A00000000001483240170%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001483240170%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&timeend=1485935999×tart=0&tm=AQBwkKKSIKOhqAju&refid=17
and if i click on this button a second page open ups and a button (which takes me to the third page) has a href
inputHref = /letsdeal?sectionLoadingID=m_timeline_loading_div_1485935999_0_36_timeline_unit%3A1%3A00000000001482227114%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001482227114%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&timeend=1485935999×tart=0&tm=AQBwkJZSIKOhqAju&refid=17
Both Href are different in the end part but similar in the start. How can i locate both of these buttons using the XPATH using one formula just like the following code.
extendButton = driver.wait.until(EC.presence_of_element_located(
(By.XPATH, "//a[contains(#href,'"+inputHref + "')]")))
You can apply a partial match using contains():
//a[contains(#href, "letsdeal")]
Or:
//a[contains(#href, "/letsdeal")]
Or, with a CSS selector:
driver.find_element_by_css_selector("a[href*=letsdeal]")
Note that I don't know how unique the "letsdeal" substring is on your page and whether it is used in other href attribute values.