Selenium to check if a website displays cookie consent popup - python

I am building a dynamic scraper with selenium and flask that can take in any URL and scrape for cookies and other details. Now I want to check if the URL has any cookie consent popup. I am unable to make this feature dynamic.
I have tried PARTIAL_LINK_TEXT, it works only for some website
url="https://www.spitzer-silo.com/"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument("--ignore-certificate-errors")
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
driver.get(url)
myDiv = driver.find_element(By.PARTIAL_LINK_TEXT, 'Cookie')
https://www.spitzer-silo.com/ works
https://www.siemens.com/de/de.html doesn't work
Also, I am searching with the "Cookie" keyword, which may not be present on some websites
another approach, I tried using a window handle but it shows only one window
url="https://www.siemens.com/de/de.html"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
# Create the webdriver object and pass the arguments
options = webdriver.ChromeOptions()
# Chrome will start in Headless mode
options.add_argument('headless')
# Ignores any certificate errors if there is any
options.add_argument("--ignore-certificate-errors")
# Startup the chrome webdriver with executable path and
# pass the chrome options and desired capabilities as
# parameters.
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
# Send a request to the website and let it load
driver.get(url)
time.sleep(30)
whandle = driver.window_handles
['CDwindow-E9E6A9B1021BBA75132EF9DCA40A2824']
Is there any way I could check if there is a popup on the website and then check if the popup has a text cookie on it
I appreciate all the help I can get.

Related

Want to load ad-less website using Python on Chrome: Disable acceptable ads for AdBlockerPlus through Selenium

I want to load news-websites ad-lessly on Chrome using Python-Selenium-AdBlockerPlus (APB). I can add the crx file for ABP in options. BUT AdBlocker has these 'Acceptable non-intrusive ads' that they let through. I can go into settings and disable it for normal browsing, but I do not know how to do it when Chrome is controlled through Selenium automation.
Note:
I downloaded the crx file for AdBlocker from the Chrome extensions website https://chrome-extension-downloader.com/, searching for "gighmmpiobklfepjocnamgkkbiglidom"
My program runs without error - I just want to add the Chrome option to disable the 'Acceptable Ads' for ABP, if it is even possible. This is my code.
from selenium import webdriver
EXECUTABLE = r"~/chromedriver.exe"
ADBLOCK = r"~/AdBlock –-der-beste-Ad-Blocker_v4.35.0.crx"
# set options for driver
options = webdriver.ChromeOptions()
options.add_extension(ADBLOCK)
driver = webdriver.Chrome(executable_path = EXECUTABLE, options=options)
# get the URL
url = "https://www.dcclothesline.com/author/deangarrison/"
driver.maximize_window()
driver.implicitly_wait(30)
driver.get(url)
# calculate size of the loaded page
w2 = driver.execute_script("return document.body.offsetWidth;")
h2 = driver.execute_script("return document.body.offsetHeight;")
print("Webpage size with ad block", w2*h2)
driver.close()
Is there a way to do achieve it?

Python Selenium remain logged in a website

I'm running a simple scrape code to scrape a few lines from a website. The problem is that Python always opens a new Chrome window and logs in again every time I run the bot.
So, I read online and created a Chrome profile. Now it opens the Chrome profile, but it still asks me to log in to the website. I guess I need to save cookies. I tried some YouTube tutorials, but I couldn't figure out how to save the cookies. I'm a complete noob, so can anyone explain me how to do so?
This is my code:
options = Options()
options.add_argument("user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 2")
driver = webdriver.Chrome(executable_path=r'C:\Program Files (x86)\chromedriver.exe', chrome_options=options)
driver.get("https://websitetologin.com")
search = driver.find_element_by_name("fm-login-id")
search.send_keys("loginemail")
search.send_keys(Keys.RETURN)
time.sleep(3)
search = driver.find_element_by_name("fm-login-password")
search.send_keys("loginpassword")
search.send_keys(Keys.RETURN)
time.sleep(3)
search = driver.find_element_by_class_name("fm-button")
search.send_keys(Keys.RETURN)
time.sleep(3)
You can use the chrome options as well user-data-dir=selenium
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=selenium")
driver = webdriver.Chrome(options =options)
it would save cookies for current session, that can be used later for profiles and folders.
You can refer here for more
or
driver.get('http://google.com')
for cookie in cookies:
driver.add_cookie(cookie)

Python / Selenium - how do i stay signed in after calling a second driver.get()?

I have this code to log into cbt nuggets and afterwards i want to go into my playlists and collect some URLs
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support import ui
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # chromedriver 75+
options = webdriver.ChromeOptions()
options.add_argument(f"user-data-dir={userdata_path}") #Path to your chrome profile
# options.add_experimental_option("excludeSwitches", ['enable-automation'])
# options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
driver = webdriver.Chrome(executable_path=webdriver_path, options=options)
driver.get("https://www.cbtnuggets.com/login")
logs = driver.get_log("performance")
def page_is_loaded(driver):
return driver.find_element_by_tag_name("body") != None
#wait=ui.WebDriverWait(driver,300)
driver.implicitly_wait(10)
#wait.until(page_is_loaded)
USERNAME = driver.find_element_by_xpath('//*[#id="email"]')
USERNAME.send_keys("johndoe#gmail.com")
PASSWORD = driver.find_element_by_xpath("/html/body/div[1]/div[2]/main/div/div[1]/form/fieldset/div[2]/input")
PASSWORD.send_keys("password")
Login_Button=driver.find_element_by_xpath("/html/body/div[1]/div[2]/main/div/div[1]/form/fieldset/button")
Login_Button.click()
driver.get("https://www.cbtnuggets.com/learn/it-training/playlist/nrn:playlist:user:5fcf88f463ebba00155acb18/2?autostart=1")
it all works as expected, but when the last driver.get() executes, i get thrown back to the login page, but when i manually enter the second URL in the address bar it works as expected without having to log in again.
I dont know if this is a selenium issue, or if i am misunderstanding something about how HTTP Get works.
Have you tried to parse the login result? This might happen because the login request is not fully processed yet.
After Login_Button.click() you should check if the site is logged in successfully or not. You have many ways to check:
If the site redirects: check for the title of the page
If the site display a dialog: create a fluent wait to check for the dialog element to display
If you don't even bother to check, just add time.sleep(5). It's bad practice though.
After the check, now you use driver.get to go to the page you want.

Python: Selenium with Chrome Driver hanging after click

I'm trying to automate a search on an e-commerce using selenium with chrome driver, from the first url, type the search, click on the search button, return the next url and that's it.
But after clicking, the next page won't load and I can't figure out why.
the code:
from selenium import webdriver
def open():
options = webdriver.ChromeOptions()
options.add_argument("--window-size=1920x1080")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
drive = webdriver.Chrome(options=options)
url = 'https://www.extra.com.br/'
drive.get(url)
busca = drive.find_element_by_id('ctl00_TopBar_PaginaSistemaArea1_ctl05_ctl00_txtBusca')
busca.send_keys('Informática')
botao_buscar = drive.find_element_by_id('ctl00_TopBar_PaginaSistemaArea1_ctl05_ctl00_btnOK')
botao_buscar.click()
return drive.current_url
if __name__ == '__main__':
print(open())
I've tried just opening the second url directly (without the form submit) and it works.

python selenium white page

I want get html from here. This link work OK in my browser. But if I disable cookies in settings, this page reload endless.
My basic code return blank page
options = Options()
options.add_argument("--start-maximized")
cpll = "C:\Users\aaa\chromedriver.exe"
driver = webdriver.Chrome(cpll,chrome_options=options)
driver.get("https://www.elal.com/en/PassengersInfo/Useful-Info/Flight-Schedule/Pages/Flight-Updates.aspx")
I tried add cookies, ignore SSL, change driver version, but I get this page...
What could be the problem?
with
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.elal.com/en/PassengersInfo/Useful-Info/Flight-Schedule/Pages/Flight-Updates.aspx")
... i get a normal page

Categories