Click link in Kickstarter using Selenium - python

I am attempting to scrape Kickstarter based on the project names alone. Using the project name and the base URL I can get to the search page. In order to scrape the project page, I need to use Selenium to click on the URL. However, I cannot point Selenium to the correct element to click on. I would also like this to be dynamic so I do not need to put the project name each time.
<div class="type-18 clamp-5 navy-500 mb3">
<a href="https://www.kickstarter.com/projects/1980119549/knife-block-
designed-by-if-and-red-dot-winner-jle?
ref=discovery&term=Knife%20block%20-
%20Designed%20by%20IF%20and%20Red%20dot%20winner%20JLE%20Design"
class="soft-black hover-text-underline">Knife block -
Designed by IF and
Red dot winner JLE Design
</a>
</div>`
driver = webdriver.Chrome(chrome_path)
url = 'https://www.kickstarter.com/discover/advanced?ref=nav_search&term=Knife
block - Designed by IF and Red dot winner JLE Design'
driver.get(url)
elem = driver.find_elements_by_link_text('Knife block - Designed by IF and Red
dot winner JLE Design')
elem.click()
How can I get the elem to point to the correct link?

In regards to your attempt, your code had a typo: using find_elements.... returns a list of elements so the method .click() would not work. You mean to use find_element.
To dynamically click links, use an XPath instead. The resulting code would be:
elem = driver.find_element_by_xpath('//div[contains(#class, "type-18")]/a')
elem.click()
This would grab the first match. You could do find_elements and iterate over the elements but this would be a bad approach because since you're clicking the links, each time that renders the previous page stale. If there's more than one, you could use the same XPath but indexed:
first_elem = driver.find_element_by_xpath('(//div[contains(#class, "type-18")]/a)[1]')
first_elem.click()
# ...
second_elem = driver.find_element_by_xpath('(//div[contains(#class, "type-18")]/a)[2]')
second_elem.click()
# And so forth...

Related

How do you turn a string with url into a url that is clickable using selenium?

url="https://fourminutebooks.com/book-summaries/"
driver.get(url)
fin_list = []
page_tabs = driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")
for i in range(len(page_tabs)):
page_tabs[i] = page_tabs[i].get_attribute("href")
fin_list.append(page_tabs[i])
fin_list[0].click()
#html = driver.find_elements(By.CSS_SELECTOR,"header[class='entry-header page-header']")
print(fin_list)
I am trying to create a program that randomly emails me book summaries, and am having difficulty clicking on the link to get the HTML content. I have managed to get all the links, but they are saved as a string and I cannot click on one of the links without getting an error. **note the image below
This is without trying to get the first element.
You can not turn a string to be a web element object.
These are definitely different objects.
As a user you can click a web element on a web page.
Similarly, Selenium can click on a web element. It can not click string or int.
In case you want to collect a list of clickable objects you have to collect a clickable web elements.
In your particular case you can keep a page_tabs list to click on page_tabs[0] element later with
page_tabs[0].click()
page_tabs = [x.get_attribute('href') for x in driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")]
driver.get(page[0])
To simply access a url from the list you can do.

Trying to click entry buttons on gleam using selenium but am having trouble identifying the element

Here is the inspect result for the button that says +5 per day
>span class="text user-links entry-method-title ng-scope ng-binding" ng-include="::'views/entry-text/'+entry_method.entry_type+'.html'">
Click For a Daily Bonus Entry"
</span>
<div class="entry-method bonus template" data-remove-popovers="" id="em6129519" ng-class="{expanded: entryState.expanded == entry_method, template: entry_method.template, 'completed-entry-method': !canEnter(entry_method) && isEntered(entry_method)}" ng-repeat="entry_method in ::entry_methods">
here is the HTML given information when I inspect the link/button, I have tried to use XPath, CSS, link text, and class name and it keeps giving me an error saying it cannot identify the element. Does anyone have a suggestion for how to identify this, it is on gleam.io for a giveaway I'm trying to automate this so i don't have to log in and press this everyday. This is my first ever web interfacing project with python.
Here is my most recent try
driver.maximize_window()
time.sleep(10)
driver.execute_script("window.scrollTo(0, 1440)")
time.sleep(10)
button2 = driver.find_element_by_class_name("text user-links entry-method-title ng-scope ng-binding")
button2.click()
Similar to a previous issue, Selenium find_element_by_class_name and find_element_by_css_selector not working, you can't have spaces in your class name when using driver.find_element_by_class_name. Instead, find the element via css_selector and replace each space with a dot.
driver.find_element_by_css_selector("span.text.user-links.entry-method-title.ng-scope.ng-binding")
That'll fix what you have above, but keep in mind there are other ways to make selenium actions more reliable (eg. WebDriverWait, etc). And there may be a cleaner selector to use than the one above.
I believe the element you want to access is contained within an "iframe", thus you must first switch to iframe before you can access it using selectors.
driver.switch_to.frame(x.find_element_by_xpath("PUT IFRAME XPATH HERE"))

Python/Selenium web scrap how to find hidden src value from a links?

Scrapping links should be a simple feat, usually just grabbing the src value of the a tag.
I recently came across this website (https://sunteccity.com.sg/promotions) where the href value of a tags of each item cannot be found, but the redirection still works. I'm trying to figure out a way to grab the items and their corresponding links. My typical python selenium code looks something as such
all_items = bot.find_elements_by_class_name('thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
However, I can't seem to retrieve any href, onclick attributes, and I'm wondering if this is even possible. I noticed that I couldn't do a right-click, open link in new tab as well.
Are there any ways around getting the links of all these items?
Edit: Are there any ways to retrieve all the links of the items on the pages?
i.e.
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
...
Edit:
Adding an image of one such anchor tag for better clarity:
By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js) that gives you a way to get all the links, which are based on the HappeningID. You can verify by running this in the JS console, which gives you the first promotion:
window.__NUXT__.state.Promotion.promotions[0].HappeningID
Based on that, you can create a Python loop to get all the promotions:
items = driver.execute_script("return window.__NUXT__.state.Promotion;")
for item in items["promotions"]:
base = "https://sunteccity.com.sg/promotions/"
happening_id = str(item["HappeningID"])
print(base + happening_id)
That generated the following output:
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
https://sunteccity.com.sg/promotions/764
https://sunteccity.com.sg/promotions/766
https://sunteccity.com.sg/promotions/762
https://sunteccity.com.sg/promotions/767
https://sunteccity.com.sg/promotions/732
https://sunteccity.com.sg/promotions/733
https://sunteccity.com.sg/promotions/735
https://sunteccity.com.sg/promotions/736
https://sunteccity.com.sg/promotions/737
https://sunteccity.com.sg/promotions/738
https://sunteccity.com.sg/promotions/739
https://sunteccity.com.sg/promotions/740
https://sunteccity.com.sg/promotions/741
https://sunteccity.com.sg/promotions/742
https://sunteccity.com.sg/promotions/743
https://sunteccity.com.sg/promotions/744
https://sunteccity.com.sg/promotions/745
https://sunteccity.com.sg/promotions/746
https://sunteccity.com.sg/promotions/747
https://sunteccity.com.sg/promotions/748
https://sunteccity.com.sg/promotions/749
https://sunteccity.com.sg/promotions/750
https://sunteccity.com.sg/promotions/753
https://sunteccity.com.sg/promotions/755
https://sunteccity.com.sg/promotions/756
https://sunteccity.com.sg/promotions/757
https://sunteccity.com.sg/promotions/758
https://sunteccity.com.sg/promotions/759
https://sunteccity.com.sg/promotions/760
https://sunteccity.com.sg/promotions/761
https://sunteccity.com.sg/promotions/763
https://sunteccity.com.sg/promotions/765
https://sunteccity.com.sg/promotions/730
https://sunteccity.com.sg/promotions/734
https://sunteccity.com.sg/promotions/623
You are using a wrong locator. It brings you a lot of irrelevant elements.
Instead of find_elements_by_class_name('thumb-img') please try find_elements_by_css_selector('.collections-page .thumb-img') so your code will be
all_items = bot.find_elements_by_css_selector('.collections-page .thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
You can also get the desired links directly by .collections-page .thumb-img a locator so that your code could be:
links = bot.find_elements_by_css_selector('.collections-page .thumb-img a')
for link in links:
print(link.get_attribute("href"))

unable to interact with this dynamic drop down using python selenium

hi team,
I am trying to access a dynamic drop down with div as tag but I an able find it but not interact with it
as it changes its style type as shown below.
<div style ="display :none;"></div>"
to
<div style ="display :block;"></div>"
I am unable to click on this, please have a look into the screenshot for detail.
Info you have to click on the element to access this dynamic dropdown,
div is not clickable object in HTML. If it has assigned some JavaScript code to display when you click it then you may need also JavaScript to click it
driver.execute_script("arguments[0].click()", item)
and the same way you can change style
driver.execute_script("arguments[0].style.display = 'block';", item)
In this minimal working example I remove all img on this page.
from selenium import webdriver
url = 'https://stackoverflow.com/questions/65931008/unable-to-interact-with-this-dynamic-drop-down-using-python-selenium'
driver = webdriver.Firefox()
driver.get(url)
all_items = driver.find_elements_by_xpath('//img')
for item in all_items:
print(item.text)
#driver.execute_script("arguments[0].click()", item)
driver.execute_script("arguments[0].style.display = 'none';", item)
Solution -> actually the element we are looking here is masked element which means actual Id of this element is diff so by chance I am able to find it( for this you have go through HTML code row by row and find it, quiet a static way to do but that's how I did it) and place it in the code.
please do comment if you know more efficient way to work around masked elements.
Regards,
Anubhav

Can't get "WebDriver" element data if not "eye-visible" in browser using Selenium and Python

I'm doing a scraping with Selenium in Python. My problem is that after I found all the WebElements, I'm unable to get their info (id, text, etc) if the element is not really VISIBLE in the browser opened with Selenium.
What I mean is:
First image
Second image
As you can see from the first and second images, I have the first 4 "tables" that are "visible" for me and for the code. There are however, other 2 tables (5 & 6 Gettho lucky dip & Sue Specs) that are not "visible" until I drag down the right bar.
Here's what I get when I try to get the element info, without "seeing it" in the page:
Third image
Manually dragging the page to the bottom and therefore making it "visible" to the human eye (and also to the code ???) is the only way I can the data from the WebDriver element I need:
Fourth image
What am I missing ? Why Selenium can't do it in background ? Is there a manner to solve this problem without going up and down the page ?
PS: the page could be any kind of dog race page in http://greyhoundbet.racingpost.com/. Just click City - Time - and then FORM.
Here's part of my code:
# I call this function with the URL and it returns the driver object
def open_main_page(url):
chrome_path = r"c:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(url)
# Wait for page to load
loading(driver, "//*[#id='showLandingLADB']/h4/p", 0)
element = driver.find_element_by_xpath("//*[#id='showLandingLADB']/h4/p")
element.click()
# Wait for second element to load, after click
loading(driver, "//*[#id='landingLADBStart']", 0)
element = driver.find_element_by_xpath("//*[#id='landingLADBStart']")
element.click()
# Wait for main page to load.
loading(driver, "//*[#id='whRadio']", 0)
return driver
Now I have the browser "driver" which I can use to find the elements I want
url = "http://greyhoundbet.racingpost.com/#card/race_id=1640848&r_date=2018-
09-21&tab=form"
browser = open_main_page(url)
# Find dog names
names = []
text: str
tags = browser.find_elements_by_xpath("//strong")
Now "TAGS" is a list of WebDriver elements as in the figures.
I'm pretty new to this area.
UPDATE:
I've solved the problem with a code workaround.
tags = driver.find_elements_by_tag_name("strong")
for tag in tags:
driver.execute_script("arguments[0].scrollIntoView();", tag)
print(tag.text)
In this manner the browser will move to the element position and it will be able to get its information.
However I still have no idea why with this page in particular I'm not able to read webpages elements that are not visible in the Browser area untill I scroll and literally see them.

Categories