I'm trying to load all products of a website where around 20 products are shown and then a "show more" button has to be klicked to load more products. I can see that the button is klicked when I execute the code, but the content of the website is not loaded. In the end a loop shall be built to load all the products, but to do so the content first has to be visible.
import scrapy
from scrapy.selector import Selector
from selenium import webdriver
from time import sleep
url = 'https://www.galaxus.ch/de/s2/sector/haushalt-2'
driver = webdriver.Chrome('C:\DRIVERS\chromedriver_win32\chromedriver.exe')
driver.get(url)
sleep(3)
element = driver.find_element_by_id('pageFoot')
driver.execute_script("arguments[0].scrollIntoView(0, document.documentElement.scrollHeight-5);", element)
sel = Selector(text=driver.page_source)
sleep(10)
driver.find_element_by_xpath('//*[#id="pageContent"]/div[2]/div/div[5]/div[5]/div/button/span').click()
sleep(20)
driver.close()
I've also tried to click on other objects within the website but the access is then denied.
Welcome to site. Try this locator for your button.
driver.find_element_by_css_selector(".styled__DeprecatedStyledButton-sc-1wug1tr-1.isotoG")
If you will still have a problem, than scroll into a different element. For example to this button. As I see it is not visible if you scroll to the very bottom.
Also, for debugging time.sleep() is ok, but in real projects you should use only explicit/implicit waits, with some exclusions where time.sleep() is the only option.
Related
Hi Before starting Thanks for the help in advance
So I am trying to scrape google flight website : https://www.google.com/travel/flights
When scraping I have done the sending Key to the text field but I am stuck at clicking the search button it always gives the error that the field is not clickable at a point or Other elements would receive the click
the error image is
and the code is
from time import sleep
from selenium import webdriver
chromedriver_path = 'E:/chromedriver.exe'
def search(urrl):
driver = webdriver.Chrome(executable_path=chromedriver_path)
driver.get(urrl)
asd= "//div[#aria-label='Enter your destination']//div//input[#aria-label='Where else?']"
driver.find_element_by_xpath("/html/body/c-wiz[2]/div/div[2]/c-wiz/div/c-wiz/c-wiz/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[4]/div/div/div[1]/div/div/input").click()
sleep(2)
TextBox = driver.find_element_by_xpath(asd)
sleep(2)
TextBox.click()
sleep(2)
print(str(TextBox))
TextBox.send_keys('Karachi')
sleep(2)
search_button = driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/c-wiz/div/c-wiz/c-wiz/div[2]/div[1]/div[1]/div[2]/div/button/div[1]')
sleep(2)
search_button.click()
print(str(search_button))
sleep(15)
print("DONE")
driver.close()
def main():
url = "https://www.google.com/travel/flights"
print(" Getitng Data ")
search(url)
if __name__ == "__main__":
main()
and i have done it by copying the Xpath using dev tools
Thanks again
The problem you are facing is that after entering the city Karachi in the text box, there is a suggestion dropdown that is displayed over the Search Button. That is the cause of the exception as the dropdown would receive the click instead of the Search button. The intended usage of the website is to select the city from the dropdown and then continue.
A quick fix would be to first look for all of the dropdowns in the source (there are a few) and look for the one that is currently active using is_displayed(). Next you would select the first element in the dropdown suggested:
.....
TextBox.send_keys('Karachi')
sleep(2)
# the attribute(role) in the dropdowns element are named 'listbox'. Warning: This could change in the future
all_dropdowns = driver.find_elements_by_css_selector('ul[role="listbox"]')
# the active dropdown
active_dropdown = [i for i in all_dropdowns if i.is_displayed()][0]
# click the first suggestion in the dropdown (Note: this is very specific to your search. It could not be desirable in other circumstances and the logic can be modified accordingly)
active_dropdown.find_element_by_tag_name('li').click()
# I recommend using the advise #cruisepandey has offered above regarding usage of relative path instead of absolute xpath
search_button = driver.find_element_by_xpath("//button[contains(.,'Search')]")
sleep(2)
search_button.click()
.....
Also suggest to head the advise provided by #cruisepandey including research more about Explicit Waits in selenium to write better performing selenium programs. All the best
Instead of this absolute xapth
//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/c-wiz/div/c-wiz/c-wiz/div[2]/div[1]/div[1]/div[2]/div/button/div[1]
I would recommend you to have a relative path:
//button[contains(.,'Search')]
Also, I would recommend you to have explicit wait when you are trying a click operation.
Code:
search_button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(.,'Search')]")))
search_button.click()
You'd need below imports as well:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Pro tip:
Launch the browser in full screen mode.
driver.maximize_window()
You should have the above line just before driver.get(urrl) command.
I am trying to press a button with selenium because afterwards I need to inspect the full html of the website. This is the code that I am using:
driver = webdriver.Chrome()
driver.get('https://www.quattroruote.it/listino/audi/a4-allroad')
time.sleep(10)
html = driver.find_element_by_id('btnallestimenti')
html.click()
But I get this error:
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
when the page is open there are cookies and other things that shows up, is there a way to block all of them so that I can work on the html?
Thanks a lot!
As you can see the "cookies" banner is an HTML element itself and it contains a "Close" ("Chiudi") button that can be clicked.
If you inspect the page source you will find this code that relates to that button:
<button type="button" class="iubenda-cs-close-btn" tabindex="0" role="button" aria-pressed="false" style="font-size:16px!important;">Chiudi</button>
Your script needs to be modified to search the element by visible text (using the XPath) and click it to close the banner:
close_button = driver.find_element_by_xpath("//*[text()='Chiudi']")
close_button.click()
I can see that this kind of banner appears 2 times (one for cookies one for "Informativa") but once you click this one away you are redirected to the right page.
Of course you will need to test your script and adjust it to the page's behavior.
Also, be aware that every time the pages change because the devs change it your script will break and you will need to re-adjust it.
EDIT
Posting here the full code, try to use it and continue from here:
import time
from selenium.webdriver import Chrome
driver = Chrome()
driver.get("https://www.quattroruote.it/listino/audi/a4-allroad")
time.sleep(6)
driver.find_element_by_xpath("//button[text()='Accetta']").click()
time.sleep(6)
driver.switch_to.frame("promo-premium-iframe")
driver.find_element_by_xpath("//a[normalize-space()='Non sono interessato']").click()
time.sleep(6)
driver.switch_to.default_content()
driver.find_element_by_id("btnallestimenti").click()
input()
You could try to accept the cookie and proceed further, check the below lines of code.
options = Options()
options.add_argument("--disable-notifications")
driver = webdriver.Chrome(options=options,"ChromeDriver_Path")
driver.maximize_window()
driver.get('https://www.quattroruote.it/listino/audi/a4-allroad')
sleep(10)
cookie_btn = driver.find_element_by_xpath("//button[text()='Accetta']")
cookie_btn.click()
sleep(3)
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"promo-premium-iframe")))
register_btn = driver.find_element_by_xpath("//a[normalize-space()='Accedi o Registrati']")
register_btn.click()
Iframe available so just switched to the iframe, tried the to perform the registration.
I have written a simple web scraping code using Selenium but I want to scrape only the portion that is present 'before scroll'
Say, if it is this page I want to scrape - https://en.wikipedia.org/wiki/Pandas_(software) - Selenium reads information till the absolute last element/text which for me is the 'Powered by Media Wiki' button on the far bottom-right of the page.
What I want Selenium to do is stop after DataFrames (see screenshot) and not scroll down to the bottom.
And I also want to know where on the page it stops. I have checked multiple sources and most of them ask for infinite scroll websites. No one asks for just the 'visible' half of a page.
This is my code now:
from selenium import webdriver
EXECUTABLE = r"chromedriver.exe"
# get the URL
url = "https://en.wikipedia.org/wiki/Pandas_(software)"
# open the chromedriver
driver = webdriver.Chrome(executable_path = EXECUTABLE)
# google window is maximized so that all webpages are rendered in the same size
driver.maximize_window()
# make the driver wait for 30 seconds before throwing a time-out exception
driver.implicitly_wait(30)
# get URL
driver.get(url)
for element in driver.find_elements_by_xpath("//*"):
try:
#stuff
except:
continue
driver.close()
Absolutely any direction is appreciated. I have tried to be as clear as possible here but let me know if any more details are required.
I don't think that is possible. Observe the DOM, all the informational elements are under one section I mean one tag div[#id='content'], which is already visible to Selenium. Even if you try with //*, div[#id='content'] is visible.
And trying to check whether the element is visible though not scrolled, will also return True. (If someone knows to do what you are asking for, even I would like to know.)
from selenium import webdriver
from selenium.webdriver.support.expected_conditions import _element_if_visible
driver = webdriver.Chrome(executable_path = 'path to chromedriver.exe')
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://en.wikipedia.org/wiki/Pandas_(software)")
elements = driver.find_elements_by_xpath("//div[#id='content']//*")
for element in elements:
try:
if _element_if_visible(element):
print(element.get_attribute("innerText"))
except:
break
driver.quit()
I am trying to get search results from yahoo search using python - selenium and bs4. I have been able to get the links successfuly but I am not able to click the button at the bottom to go to the next page. I tried one way, but it could't identify after the second page.
Here is the link:
https://in.search.yahoo.com/search;_ylt=AwrwSY6ratRgKEcA0Bm6HAx.;_ylc=X1MDMjExNDcyMzAwMgRfcgMyBGZyAwRmcjIDc2ItdG9wLXNlYXJjaARncHJpZANidkhMeWFsMlJuLnZFX1ZVRk15LlBBBG5fcnNsdAMwBG5fc3VnZwMxMARvcmlnaW4DaW4uc2VhcmNoLnlhaG9vLmNvbQRwb3MDMARwcXN0cgMEcHFzdHJsAzAEcXN0cmwDMTQEcXVlcnkDc3RhY2slMjBvdmVyZmxvdwR0X3N0bXADMTYyNDUzMzY3OA--?p=stack+overflow&fr=sfp&iscqry=&fr2=sb-top-search
This is what im doing to get data from page but need to put in a loop which changes pages:
page = BeautifulSoup(driver.page_source, 'lxml')
lnks = page.find('div', {'id': 'web'}).find_all('a', href = True)
for i in lnks:
print(i['href'])
You don't need to scroll down to the bottom. The next button is accessible without scrolling. Suppose you want to navigate 10 pages. The python script can be like this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver=webdriver.Chrome()
driver.get('Yahoo Search URL')
# Let's create a loop containing the XPath for next button
# As well as waiting for the next button to be clickable.
for i in range(10):
WebDriverWait(driver, 5).until(EC.element_to_be_clickable(By.XPATH, '//a[#class="next"]'))
navigate = driver.find_element_by_xpath('//a[#class="next"]').click()
The next page button is on the bottom of the page so you first need to scroll to that element and then click it. Like this:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
next_page_btn = driver.find_element_by_css_selector("a.next")
actions.move_to_element(next_page_btn).build().perform()
time.sleep(0.5)
next_page_btn.click()
I am trying to scrape information on a website where the information is not immediately present. When you click a certain button, the page begins to load new content on the bottom of the page, and after it's done loading, red text shows up as "Assists (At Least)". I am able to find the first button "Go to Prop builder", which doesn't immediately show up on the page, but after the script clicks the button, it times out when trying to find the "Assists (At Least)" text, in spite of the script sleeping and being present on the screen.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.bovada.lv/sports/basketball/nba')
# this part succeeds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "//span[text()='Go to Prop builder']")
)
)
element.click()
time.sleep(5)
# this part fails
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
time.sleep(2)
innerHTML = driver.execute_script('return document.body.innerHTML')
driver.quit()
soup = BeautifulSoup(innerHTML, 'html.parser')
The problem is the Assist element is under a frame. You need to switch to the frame like this:
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,"player-props-frame")))
driver.switch_to.frame(frame)
Increase the timeout to confirm the timeout provided is correct, You can also confirm using debug mode. If still issue persist, please check "Assists (At Least)" element do not fall under any frame.
You can also share the DOM and proper error message if issue not resolved.
I have a couple of suggestions you could try,
Make sure that the content loaded at the bottom of the is not in a frame. If it is, you need to switch to the particular frame
Check the XPath is correct, try the XPath is matching from the Developer Console
Inspect the element from the browser, once the Developer console is open, press CTRL +F and then try your XPath. if it's not highlighting check frames
Check if there is are any iframes in the page, search for iframe in the view page source, and if you find any for that field which you are looking for, then switch to that frame first.
driver.switch_to.frame("name of the iframe")
Try adding a re-try logic with timeout, and a refresh button if any on the page
st = time.time()
while st+180>time.time():
try:
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
except:
pass
The content you want is in an iFrame. You can access it by switching to it first, like this:
iframe=driver.find_element_by_css_selector('iframe[class="player-props-frame"]')
driver.switch_to.frame(iframe)
Round brackets are the issue here (at least in some cases...). If possible, use .contains selector:
//*[contains(text(),'Assists ') and contains(text(),'At Least')]