I am trying to get search results from yahoo search using python - selenium and bs4. I have been able to get the links successfuly but I am not able to click the button at the bottom to go to the next page. I tried one way, but it could't identify after the second page.
Here is the link:
https://in.search.yahoo.com/search;_ylt=AwrwSY6ratRgKEcA0Bm6HAx.;_ylc=X1MDMjExNDcyMzAwMgRfcgMyBGZyAwRmcjIDc2ItdG9wLXNlYXJjaARncHJpZANidkhMeWFsMlJuLnZFX1ZVRk15LlBBBG5fcnNsdAMwBG5fc3VnZwMxMARvcmlnaW4DaW4uc2VhcmNoLnlhaG9vLmNvbQRwb3MDMARwcXN0cgMEcHFzdHJsAzAEcXN0cmwDMTQEcXVlcnkDc3RhY2slMjBvdmVyZmxvdwR0X3N0bXADMTYyNDUzMzY3OA--?p=stack+overflow&fr=sfp&iscqry=&fr2=sb-top-search
This is what im doing to get data from page but need to put in a loop which changes pages:
page = BeautifulSoup(driver.page_source, 'lxml')
lnks = page.find('div', {'id': 'web'}).find_all('a', href = True)
for i in lnks:
print(i['href'])
You don't need to scroll down to the bottom. The next button is accessible without scrolling. Suppose you want to navigate 10 pages. The python script can be like this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver=webdriver.Chrome()
driver.get('Yahoo Search URL')
# Let's create a loop containing the XPath for next button
# As well as waiting for the next button to be clickable.
for i in range(10):
WebDriverWait(driver, 5).until(EC.element_to_be_clickable(By.XPATH, '//a[#class="next"]'))
navigate = driver.find_element_by_xpath('//a[#class="next"]').click()
The next page button is on the bottom of the page so you first need to scroll to that element and then click it. Like this:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
next_page_btn = driver.find_element_by_css_selector("a.next")
actions.move_to_element(next_page_btn).build().perform()
time.sleep(0.5)
next_page_btn.click()
Related
I am trying to click to an object that I select with Xpath, but there seems to be problem that I could not located the element. I am trying to click accept on the page's "Terms of Use" button. The code I have written is as
driver.get(link)
accept_button = driver.find_element_by_xpath('//*[#id="terms-ok"]')
accept_button.click()
prov = driver.find_element_by_id("province-region")
prov.click()
Here is the HTML code I have:
And I am getting a "NoSuchElementException". My goal is to click this "Kabul Ediyorum" button at the bottom of the HTML code. I started to think that we have some restrictions on the tasks we could do on that page. Any ideas ?
Not really sure what the issue might be.
But you could try the following:
Try to locate the element by its visible text
accept_button = driver.find_element_by_xpath("//*[text()='Kabul Ediyorum']").click()
Try with ActionChains
For that you need to import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
accept_button = driver.find_element_by_xpath("//*[text()='Kabul Ediyorum']")
actions = ActionChains(driver)
actions.click(on_element=accept_button).perform()
Also make sure you have implicitly wait
# implicitly wait
driver.implicitly_wait(10)
Or explicit wait
element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Kabul Ediyorum']"))).click()
Hope this helped!
I'm trying to load all products of a website where around 20 products are shown and then a "show more" button has to be klicked to load more products. I can see that the button is klicked when I execute the code, but the content of the website is not loaded. In the end a loop shall be built to load all the products, but to do so the content first has to be visible.
import scrapy
from scrapy.selector import Selector
from selenium import webdriver
from time import sleep
url = 'https://www.galaxus.ch/de/s2/sector/haushalt-2'
driver = webdriver.Chrome('C:\DRIVERS\chromedriver_win32\chromedriver.exe')
driver.get(url)
sleep(3)
element = driver.find_element_by_id('pageFoot')
driver.execute_script("arguments[0].scrollIntoView(0, document.documentElement.scrollHeight-5);", element)
sel = Selector(text=driver.page_source)
sleep(10)
driver.find_element_by_xpath('//*[#id="pageContent"]/div[2]/div/div[5]/div[5]/div/button/span').click()
sleep(20)
driver.close()
I've also tried to click on other objects within the website but the access is then denied.
Welcome to site. Try this locator for your button.
driver.find_element_by_css_selector(".styled__DeprecatedStyledButton-sc-1wug1tr-1.isotoG")
If you will still have a problem, than scroll into a different element. For example to this button. As I see it is not visible if you scroll to the very bottom.
Also, for debugging time.sleep() is ok, but in real projects you should use only explicit/implicit waits, with some exclusions where time.sleep() is the only option.
I am trying to scrape information on a website where the information is not immediately present. When you click a certain button, the page begins to load new content on the bottom of the page, and after it's done loading, red text shows up as "Assists (At Least)". I am able to find the first button "Go to Prop builder", which doesn't immediately show up on the page, but after the script clicks the button, it times out when trying to find the "Assists (At Least)" text, in spite of the script sleeping and being present on the screen.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.bovada.lv/sports/basketball/nba')
# this part succeeds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "//span[text()='Go to Prop builder']")
)
)
element.click()
time.sleep(5)
# this part fails
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
time.sleep(2)
innerHTML = driver.execute_script('return document.body.innerHTML')
driver.quit()
soup = BeautifulSoup(innerHTML, 'html.parser')
The problem is the Assist element is under a frame. You need to switch to the frame like this:
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,"player-props-frame")))
driver.switch_to.frame(frame)
Increase the timeout to confirm the timeout provided is correct, You can also confirm using debug mode. If still issue persist, please check "Assists (At Least)" element do not fall under any frame.
You can also share the DOM and proper error message if issue not resolved.
I have a couple of suggestions you could try,
Make sure that the content loaded at the bottom of the is not in a frame. If it is, you need to switch to the particular frame
Check the XPath is correct, try the XPath is matching from the Developer Console
Inspect the element from the browser, once the Developer console is open, press CTRL +F and then try your XPath. if it's not highlighting check frames
Check if there is are any iframes in the page, search for iframe in the view page source, and if you find any for that field which you are looking for, then switch to that frame first.
driver.switch_to.frame("name of the iframe")
Try adding a re-try logic with timeout, and a refresh button if any on the page
st = time.time()
while st+180>time.time():
try:
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
except:
pass
The content you want is in an iFrame. You can access it by switching to it first, like this:
iframe=driver.find_element_by_css_selector('iframe[class="player-props-frame"]')
driver.switch_to.frame(iframe)
Round brackets are the issue here (at least in some cases...). If possible, use .contains selector:
//*[contains(text(),'Assists ') and contains(text(),'At Least')]
I am scraping an angular.js site. My initial link has a search button. I find by xpath and click with no issues. After I click search, I want to be able to click each of the athletes in the table to go to their info pages, but I am not having success with the click method. The links are attached to their names.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
TIMEOUT = 5
driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)
url = 'https://n.rivals.com/search#?formValues=%7B%22sport%22:%22Football%22,%22recruit_year%22:2021,%22offer_and_visit_type%22:%5B%22Offer%22%5D,%22prospect_profiles.prospect_colleges.offer%22:true,%22page_number%22:1,%22page_size%22:50%7D'
try:
driver.get(url)
except TimeoutException:
pass
search_button = driver.find_element_by_xpath('//*[#id="articles"]/div/div[2]/div/div/div[1]/form/div[2]/div[5]/button')
search_button.click();
#below is where I tried, but could not get to click
first_athlete = driver.find_element_by_xpath('//*[#id="content_"]/td[1]/div[2]/a')
first_athlete.click();
Works if you remove the last /a in the xpath:
first_athlete = driver.find_element_by_xpath('//*[#id="content_"]/td[1]/div[2]')
first_athlete.click()
If you want to search for all athletes and you have the name of athletes with you, you can use CSS selector as well.
athelete = driver.find_elements_by_css_selector(`#content_ > td > div > a[href *="donovan-jackson"]);
athelete.click();
This code will give you a unique web element for each player.
Thanks
I am using selenium to go to a url and click search. This part works fine. After searching, I want to scrape all the href URL's associated with the athletes on the page and click on the next page. I have tried several class and xpath locations without success...
Goal:
1) Go to the URL listed
2) Click search buttton
3) Scrape all the urls that go to each athletes profile page
4) Click the next page button at the bottom
5) Repeat this process through all the pages
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
TIMEOUT = 5
driver = webdriver.Firefox()
driver.set_page_load_timeout(TIMEOUT)
url = 'https://n.rivals.com/search#?formValues=%7B%22sport%22:%22Football%22,%22recruit_year%22:2021,%22offer_and_visit_type%22:%5B%22Offer%22%5D,%22prospect_profiles.prospect_colleges.offer%22:true,%22page_number%22:1,%22page_size%22:50%7D'
try:
driver.get(url)
except TimeoutException:
pass
#this click method works
search_button = driver.find_element_by_xpath('//*[#id="articles"]/div/div[2]/div/div/div[1]/form/div[2]/div[5]/button')
search_button.click();
#I cannot find/get the href links below to print:
profile_page = driver.find_elements_by_xpath('//*[#id="content_"]/td[1]/div[2]/div/a')
profile_page = [home.get_attribute("href") for home in profile_page]
print(profile_page)
#I cannot get it to click the next button to do the same thing on the next page:
next_button = driver.find_element_by_xpath('//*[#id="content_"]/td[1]/div[2]/div/a')
next_button.click();