Selenium / Use pagination on site?

Selenium / Use pagination on site? - python

i want to trigger the pagination on this site:
https://www.kicker.de/bundesliga/topspieler/2008-09
I found the element with this XPATH in the chrome-inspector:
driver.find_element(By.XPATH,"//a[#class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()
Now i want to click this element to go one page further - but i get an error.
This is my code:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from sys import platform
import os, sys
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
if __name__ == '__main__':
print(f"Checking chromedriver...")
os.environ['WDM_LOG_LEVEL'] = '0'
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_argument("--disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
srv=Service(ChromeDriverManager().install())
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
seasonList = ["2008-09","2009-10","2010-11","2011-12","2012-13","2013-14","2014-15",
"2015-16","2016-17","2017-18","2018-19","2020-21", "2021-22"]
for season in seasonList:
tmpSeason = f"{season[:4]}/20{season[5:]}"
link = f"https://www.kicker.de/bundesliga/topspieler/{season}"
print(f"Working for link {link}...")
driver.get (link)
time.sleep(WAIT)
while True:
soup = BeautifulSoup (driver.page_source, 'html.parser')
tmpTABLE = soup.find("table")
tmpTR = tmpTABLE.find_all("tr")
driver.find_element(By.XPATH,"//a[#class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()
time.sleep(WAIT)
But i get this error:
Traceback (most recent call last):
File "C:\Users\Polzi\Documents\DEV\Fiverr\ORDER\fireworkenter\collGrades.py", line 116, in <module>
driver.find_element(By.XPATH,"//a[#class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()
File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 400, in execute
self.error_handler.check_response(response)
File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 236, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
(Session info: headless chrome=99.0.4844.82)
How can i go to the next page using selenium?

The go to the next page you can click on the next page element inducing WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.kick__pagination__button--active +a"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class, 'kick__pagination__button--active')]//following::a[1]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

TimeoutException error using wedrivier.wait in Selenium logging within glassdoor.co.in

I am going to scrap Glassdoor to extract companies' reviews! for the 1st step, I need to log in to extract all reviews, I put a time.sleep and wait time for clicking the "Singing" button, but I have still the following error:
raise TimeoutException(message, screen, stacktrace) TimeoutException
My code is like below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
from selenium.webdriver.chrome.options import Options
import time
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
driver_path= r"C:\Users\TMaghsoudi\Desktop\chromedriver_win32.exe"
# chrome options
options = webdriver.ChromeOptions()
# options.add_argument("--start-maximized")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.add_experimental_option('excludeSwitches', ['enable-logging'])
Pros =[]
Cons=[]
Re_Titles =[]
Re_rates= []
Employ_status= []
Re_dates = []
# set driver
driver = webdriver.Chrome(driver_path, chrome_options=options)
# get url
url = "https://www.glassdoor.co.in/Job/index.htm"
driver.get(url)
time.sleep(3)
driver.find_element(By.CLASS_NAME, "HeaderStyles__signInButton").click()
time.sleep(5)
Enter_email= driver.find_element(By.ID, "modalUserEmail")
Enter_email.send_keys("XXXXX")
Enter_email.send_keys(Keys.ENTER)
time.sleep(3)
Enter_pass= driver.find_element(By.ID,"modalUserPassword")
Enter_pass.send_keys("XXXX")
time.sleep(3)
# SingIn= WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[#class='d-flex align-items-center flex-column']/button[#class='gd-ui-button mt-std minWidthBtn css-1dqhu4c evpplnh0']")))
SingIn= WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "evpplnh1")))
SingIn.click()

To send a character sequence to the Email and Password field you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://www.glassdoor.co.in/Job/index.htm")
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.d-lg-block button.HeaderStyles__signInButton"))).click()
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#modalUserEmail"))).send_keys(email + Keys.RETURN)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#modalUserPassword"))).send_keys(password + Keys.RETURN)
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:

selenium Webdriver wait until expected conditon not working properly on Amazon EC2 instance

I made a script to visit a page and log in then get a download link from the page.
The script works fine on my local window machine, but it's not working on Amazon EC2 instance(ubuntu)
The code is as below
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
dir_chrome_driver = "c:/selenium/driver/chromedriver.exe"
parser = ConfigParser()
option = webdriver.chrome.options.Options()
url = "https://ams.amazon.com/webpublisher/analytics/requested_downloads"
option.add_argument('--user-agent="Chrome/102.0.5005.115"')
option.add_argument("--headless")
option.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path=dir_chrome_driver, options=option)
# driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=option)
driver.get(url)
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#ap_email')))
driver.find_element(By.ID, "ap_email").send_keys(USER_ID)
driver.find_element(By.ID, "ap_password").send_keys(USER_PASSWORD)
driver.find_element(By.ID, "signInSubmit").click()
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.download-link')))
download_link = driver.find_element(By.CSS_SELECTOR, ".download-link")
It gives me an error
"File "aps.py", line 46, in <module>
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.download-link')))
File "/home/ubuntu/.local/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 90, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException:"
I added fixed time wait between 'click' and WebDriverWait like below.
driver.find_element(By.ID, "signInSubmit").click()
time.sleep(30)
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.download-link')))
It worked for a while, but it became not work again today.
I tried to change wait time but the driver still in the login page.
Please advise me if there is any possible cause or solution.

You need a different setup for selenium on ubuntu/debian:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

Selenium / Try to click button in shadow-root?

i try to click the cookie all accept button on this page:
https://www.tiktok.com/#shneorgru
with the following code
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from sys import platform
import os, sys
import xlwings as xw
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
if __name__ == '__main__':
SAVE_INTERVAL = 5
WAIT = 1
print(f"Checking chromedriver...")
os.environ['WDM_LOG_LEVEL'] = '0'
ua = UserAgent()
userAgent = ua.random
options = Options()
# options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument(f'user-agent={userAgent}')
srv=Service(ChromeDriverManager().install())
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = f"https://www.tiktok.com/#shneorgru"
driver.get (link)
time.sleep(WAIT)
driverElem = driver.find_element(By.XPATH,"//tiktok-cookie-banner")
root1 = driver.execute_script("return arguments[0].shadowRoot", driverElem)
root1.find_element(By.XPATH,"(//button)[2]").click()
time.sleep(WAIT)
I try to select first the tag which is around the shadwo-root.
Then try to execute the script for the shadowRoot.
And then find the element inside the shadowRoot and finally click the button.
But i allways get the following error:
$ python collTikTok.py
Checking chromedriver...
Traceback (most recent call last):
File "C:\Users\Polzi\Documents\DEV\Fiverr\TRY\rosen771\collTikTok.py", line 56, in <module>
root1.find_element(By.XPATH,"(//button)[2]").click()
File "C:\Users\Polzi\Documents\DEV\.venv\selenium\lib\site-packages\selenium\webdriver\remote\shadowroot.py", line 59, in find_element
return self._execute(
File "C:\Users\Polzi\Documents\DEV\.venv\selenium\lib\site-packages\selenium\webdriver\remote\shadowroot.py", line 94, in _execute
return self.session.execute(command, params)
File "C:\Users\Polzi\Documents\DEV\.venv\selenium\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 425, in execute
self.error_handler.check_response(response)
File "C:\Users\Polzi\Documents\DEV\.venv\selenium\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
How can i click this cookie accept button?
Here is the code how it looks like in the shadow-root:

Python Selenium checking checkbox: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element

I have written code to select the checkbox at following website: https://www.theatlantic.com/do-not-sell-my-personal-information/
I have tried following versions:
Version 1:
ele = driver.find_element_by_id('residency')
driver.execute_script("arguments[0].click()",ele)
Version 2: checkBox1 = driver.find_element_by_css_selector("input[id='residency']")
Version 3: driver.find_element_by_xpath("//input[#type='checkbox']")
However, for all of these versions I get following error:
Traceback (most recent call last):
File "website-functions/theatlantic.py", line 43, in <module>
atlantic_DD_formfill(california_resident, email, zipcode)
File "website-functions/theatlantic.py", line 30, in atlantic_DD_formfill
driver.find_element_by_xpath("//input[#type='checkbox']")
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
'value': value})['value']
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[#type='checkbox']"}
(Session info: headless chrome=80.0.3987.87)
Here you can see the full code:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import os
import time
def atlantic_DD_formfill(california_resident, email, zipcode):
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options)
driver.set_page_load_timeout(10)
driver.set_window_size(1124, 850) # set browser size
# link to data delete form
print("opening data delete form")
driver.get("https://www.theatlantic.com/do-not-sell-my-personal-information/")
#Select California Resident Field:
#ele = driver.find_element_by_id('residency')
#driver.execute_script("arguments[0].click()",ele)
#checkBox1 = driver.find_element_by_css_selector("input[id='residency']")
#if(NOT(checkBox1.isSelected())):
# checkBox1.click()
driver.find_element_by_xpath("//input[#type='checkbox']")
print("California Resident Field selected")
driver.find_element_by_id("email").send_keys(email)
driver.find_element_by_id("zip-code").send_keys(email)
# KEEP THIS DISABLED BC IT ACTUALLY SUBMITS
# driver.find_element_by_id("SubmitButton2").send_keys(Keys.ENTER)
print("executed")
time.sleep(4)
driver.quit()
return None
california_resident=True
email = "joe#musterman.com"
zipcode=12345
atlantic_DD_formfill(california_resident, email, zipcode)

There is an iframe present on the page, so you need to first switch to that iframe and then click on the element and as an another element is placed above the checkbox element, you need to use java script click method to click on the checkbox.
You can do it like:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import os
import time
def atlantic_DD_formfill(california_resident, email, zipcode):
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options)
driver.set_page_load_timeout(10)
driver.set_window_size(1124, 850) # set browser size
# link to data delete form
print("opening data delete form")
driver.get("https://www.theatlantic.com/do-not-sell-my-personal-information/")
driver.switch_to.frame(driver.find_element_by_tag_name('iframe'))
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//input[#type='checkbox']")))
driver.execute_script("arguments[0].click();", element)

Timeout exception while waiting for webpage with element in headless chrome

Following code uses non-headless chrome and it works:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--start-maximized')
chrome_options.binary_location = 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
driver = webdriver.Chrome(executable_path=os.path.abspath("C:\User\Program Files\chrome-driver\chromedriver.exe"))
driver.set_window_size(1200, 600)
driver.get("login-url")
driver.find_element_by_id("loginId").send_keys("uname")
driver.find_element_by_id("newPassword").send_keys("pwd")
driver.find_element_by_name("submit-button").click()
driver.set_window_size(1200, 800)
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID,"user-info")))
v = driver.find_element_by_xpath("//tr[4]/td[5]/span").text
print(v)
When I choose to use headless chrome:
driver = webdriver.Chrome(executable_path=os.path.abspath("C:\User\Program Files\chrome-driver\chromedriver.exe"), chrome_options=chrome_options)
It throws following exception:
Traceback (most recent call last):
File "C:/User/workspaces/pyworkspaces/fin2/venv/process.py", line 28, in <module>
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID,"user-info")))
File "C:\User\workspaces\pyworkspaces\fin2\venv\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
So it seems to fail on following line, in headless chrome:
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID,"user-info")))
I also tried presence_of_element_located:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,"user-info")))
But it still gives TimeOutException. Why is it so?

What I did in the past is adding the argument '--start-maximized':
chrome_options.add_argument('--start-maximized')
Try it hope it helps you!

Instead of using presence_of_element_located() you need to wait for visibility_of_element_located() and you can use the following Locator Strategy:
Using ID:
element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID,"user-info")))
Using CSS_SELECTOR:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#user-info")))
Using XPATH:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#id='user-info']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium / Use pagination on site? - python

Related

TimeoutException error using wedrivier.wait in Selenium logging within glassdoor.co.in

selenium Webdriver wait until expected conditon not working properly on Amazon EC2 instance

Selenium / Try to click button in shadow-root?

Python Selenium checking checkbox: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element

Timeout exception while waiting for webpage with element in headless chrome

Categories

Resources