python selenium white page - python

I want get html from here. This link work OK in my browser. But if I disable cookies in settings, this page reload endless.
My basic code return blank page
options = Options()
options.add_argument("--start-maximized")
cpll = "C:\Users\aaa\chromedriver.exe"
driver = webdriver.Chrome(cpll,chrome_options=options)
driver.get("https://www.elal.com/en/PassengersInfo/Useful-Info/Flight-Schedule/Pages/Flight-Updates.aspx")
I tried add cookies, ignore SSL, change driver version, but I get this page...
What could be the problem?

with
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.elal.com/en/PassengersInfo/Useful-Info/Flight-Schedule/Pages/Flight-Updates.aspx")
... i get a normal page

Related

Selenium to check if a website displays cookie consent popup

I am building a dynamic scraper with selenium and flask that can take in any URL and scrape for cookies and other details. Now I want to check if the URL has any cookie consent popup. I am unable to make this feature dynamic.
I have tried PARTIAL_LINK_TEXT, it works only for some website
url="https://www.spitzer-silo.com/"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument("--ignore-certificate-errors")
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
driver.get(url)
myDiv = driver.find_element(By.PARTIAL_LINK_TEXT, 'Cookie')
https://www.spitzer-silo.com/ works
https://www.siemens.com/de/de.html doesn't work
Also, I am searching with the "Cookie" keyword, which may not be present on some websites
another approach, I tried using a window handle but it shows only one window
url="https://www.siemens.com/de/de.html"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
# Create the webdriver object and pass the arguments
options = webdriver.ChromeOptions()
# Chrome will start in Headless mode
options.add_argument('headless')
# Ignores any certificate errors if there is any
options.add_argument("--ignore-certificate-errors")
# Startup the chrome webdriver with executable path and
# pass the chrome options and desired capabilities as
# parameters.
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
# Send a request to the website and let it load
driver.get(url)
time.sleep(30)
whandle = driver.window_handles
['CDwindow-E9E6A9B1021BBA75132EF9DCA40A2824']
Is there any way I could check if there is a popup on the website and then check if the popup has a text cookie on it
I appreciate all the help I can get.

Selenium browser is getting an enable cookies page, not the page I am sending it to

I am trying to scrape a js website with selenium. When beautiful soup reads what selenium retrieved I get an html page that says: "Cookies must be enabled in order to view this page."
If anyone could help me past this stumbling block I would appreciate it. Here is my code:
# import libraries and specify URL
import lxml as lxml
import pandas as pd
from bs4 import BeautifulSoup
import html5lib
from selenium import webdriver
import urllib.request
import csv
url = "https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2020/06/09"
#new chrome session
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(executable_path= '/Users/susanwhite/PycharmProjects/Horse
Racing/chromedriver', chrome_options=chrome_options)
# Wait for the page to fully load
driver.implicitly_wait(time_to_wait=10)
# Load the web page
driver.get(url)
cookies = driver.get_cookies()
# Parse HTML code and grab tables with Beautiful Soup
soup = BeautifulSoup(driver.page_source, 'html5lib')
print(soup)
Try removing this line: chrome_options.add_argument("--incognito"). There's no need for it, as Selenium naturally doesn't save cookies or any other information from websites.
Removing below code solved it for me, but headless mode will be disabled and the browser window will be visible.
chrome_options.add_argument("--headless")
Your issues might also be with the specific website you're accessing. I had the same issue, and after poking around with it, it looks like something in the way the HKJC website loads, selenium thinks the page is finished loading prematurely. I was able to get good page_source objects out of fetching the page by putting in a time.sleep(30) after the get statement, so my code looks like:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options, executable_path=r'C:\Python\webdrivers\geckodriver.exe')
driver.get("https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2023/01/01&RaceNo=1")
time.sleep(30)
html = driver.page_source
with open('Date_2023-01-01_Race1.html', 'wb') as f:
f.write(html.encode('utf-8'))
f.close()
You might not have to sleep that long. I found manually loading the pages takes 20+ seconds for me because I have slow internet over VPNs. It also works headless for me as above.
You do have to make sure the Firefox geckodriver is the latest (at least according to other posts, as I only tried this over ~2 days, so not long enough for my installed Firefox and geckodriver to get out of sync)

How to automate secure encrypted sites in Selenium using python?

I'm new to Python. I'm trying to do automation by opening a login page in Selenium.
from selenium import webdriver
browser = webdriver.Chrome(executable_path='chromedriver')
I tried to test some sites like - 'https://www.google.com/',etc. which is working perfectly fine.
url = 'https://www.google.com/'
browser.get(url)
I'm trying to open below url,
url = 'https://yesonline.yesbank.co.in/index.html?module=login'
browser.get(url)
I got the following error in selenium browser while the url is working fine without selenium.
Access Denied
You don't have permission to access
"http://yesonline.yesbank.co.in/index.html?" on this server.
Reference
#18.ef87d317.1625646692.41fe4bc0
But when I'm trying to just open the base url, it is opening but the site gets loads partially and keep showing loading.
url = 'https://yesonline.yesbank.co.in'
browser.get(url)
I feel like I am missing out something while opening the login url which I'm not able to get what exactly.
I also tried changing the webdriver i.e with Firefox.
url = 'https://yesonline.yesbank.co.in'
firefox_browser = webdriver.Firefox()
And guess what, it was opening!
But as soon as I'm trying to get the login page (even by manually using the mouse and clicking login page).
url = 'https://yesonline.yesbank.co.in/index.html?module=login'
firefox_browser.get(url)
'firefox_browser' is getting closed with an session reset error.
Can someone help me how to open secure sites in selenium. Or is there any other way to get it done.
It's finally working with chrome-driver by adding some arguments to it.
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
options.add_argument('--disable-blink-features=AutomationControlled')
browser = webdriver.Chrome(executable_path='chromedriver', options = options)

Python / Selenium - how do i stay signed in after calling a second driver.get()?

I have this code to log into cbt nuggets and afterwards i want to go into my playlists and collect some URLs
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support import ui
capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # chromedriver 75+
options = webdriver.ChromeOptions()
options.add_argument(f"user-data-dir={userdata_path}") #Path to your chrome profile
# options.add_experimental_option("excludeSwitches", ['enable-automation'])
# options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
driver = webdriver.Chrome(executable_path=webdriver_path, options=options)
driver.get("https://www.cbtnuggets.com/login")
logs = driver.get_log("performance")
def page_is_loaded(driver):
return driver.find_element_by_tag_name("body") != None
#wait=ui.WebDriverWait(driver,300)
driver.implicitly_wait(10)
#wait.until(page_is_loaded)
USERNAME = driver.find_element_by_xpath('//*[#id="email"]')
USERNAME.send_keys("johndoe#gmail.com")
PASSWORD = driver.find_element_by_xpath("/html/body/div[1]/div[2]/main/div/div[1]/form/fieldset/div[2]/input")
PASSWORD.send_keys("password")
Login_Button=driver.find_element_by_xpath("/html/body/div[1]/div[2]/main/div/div[1]/form/fieldset/button")
Login_Button.click()
driver.get("https://www.cbtnuggets.com/learn/it-training/playlist/nrn:playlist:user:5fcf88f463ebba00155acb18/2?autostart=1")
it all works as expected, but when the last driver.get() executes, i get thrown back to the login page, but when i manually enter the second URL in the address bar it works as expected without having to log in again.
I dont know if this is a selenium issue, or if i am misunderstanding something about how HTTP Get works.
Have you tried to parse the login result? This might happen because the login request is not fully processed yet.
After Login_Button.click() you should check if the site is logged in successfully or not. You have many ways to check:
If the site redirects: check for the title of the page
If the site display a dialog: create a fluent wait to check for the dialog element to display
If you don't even bother to check, just add time.sleep(5). It's bad practice though.
After the check, now you use driver.get to go to the page you want.

how can i remove notifications and alerts from browser? selenium python 2.7.7

I am trying to submit information in a webpage, but selenium throws this error:
UnexpectedAlertPresentException: Alert Text: This page is asking you
to confirm that you want to leave - data you have entered may not be
saved. ,
>
It's not a leave notification; here is a pic of the notification -
.
If I click in never show this notification again, my action doesn't get saved; is there a way to save it or disable all notifications?
edit: I'm using firefox.
You can disable the browser notifications, using chrome options. Sample code below:
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
With the latest version of Firefox the above preferences didn't work.
Below is the solution which disable notifications using Firefox object
_browser_profile = webdriver.FirefoxProfile()
_browser_profile.set_preference("dom.webnotifications.enabled", False)
webdriver.Firefox(firefox_profile=_browser_profile)
Disable notifications when using Remote Object:
webdriver.Remote(desired_capabilities=_desired_caps, command_executor=_url, options=_custom_options, browser_profile=_browser_profile)
selenium==3.11.0
Usually with browser settings like this, any changes you make are going to get throws away the next time Selenium starts up a new browser instance.
Are you using a dedicated Firefox profile to run your selenium tests? If so, in that Firefox profile, set this setting to what you want and then close the browser. That should properly save it for its next use. You will need to tell Selenium to use this profile though, thats done by SetCapabilities when you start the driver session.
This will do it:
from selenium.webdriver.firefox.options import Options
options = Options()
options.set_preference("dom.webnotifications.enabled", False)
browser = webdriver.Firefox(firefox_options=options)
For Google Chrome and v3 of Selenium you may receive "DeprecationWarning: use options instead of chrome_options", so you will want to do the following:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument('--disable-notifications')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
Note: I am using webdriver-manager, but this also works with specifying the executable_path.
This answer is an improvement on TH Todorov code snippet, based on what is working as of Chrome (Version 80.0.3987.163).
lk = os.path.join(os.getcwd(), "chromedriver",) --> in this line you provide the link to the chromedriver, which you can download from chromedrive link
import os
from selenium import webdriver
lk = os.path.join(os.getcwd(), "chromedriver",)
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(lk, options=chrome_options)

Categories