I am working on an Intranet with nested frames, and am unable to access a child frame.
The HTML source:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>VIS</title>
<link rel="shortcut icon" href="https://bbbbb/ma1/imagenes/iconos/favicon.ico">
</head>
<frameset rows="51,*" frameborder="no" scrolling="no" border="0">
<frame id="cabecera" name="cabecera" src="./blablabla.html" scrolling="no" border="3">
<frameset id="frame2" name="frame2" cols="180,*,0" frameborder="no" border="1">
<frame id="menu" name="menu" src="./blablabla_files/Menu.html" marginwidth="5" scrolling="auto" frameborder="3">
Buscar
<frame id="contenido" name="contenido" src="./blablabla_files/saved_resource.html" marginwidth="5" marginheight="5">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>BUSCAr</title>
</head>
<frameset name="principal" rows="220,*" frameborder="NO">
<frame name="Formulario" src="./BusquedaSimple.html" scrolling="AUTO" noresize="noresize">
<input id="year" name="year" size="4" maxlength="4" value="" onchange="javascript:Orden();" onfocus="this.value='2018';this.select();" type="text">
<frame name="Busqueda" src="./saved_resource(2).html" scrolling="AUTO">
</frameset>
<noframes>
<body>
<p>soporte a tramas.</p>
</body>
</noframes>
</html>
<frame name="frameblank" marginwidth="0" scrolling="no" src="./blablabla_files/saved_resource(1).html">
</frameset>
<noframes>
<P>Para ver esta página.</P>
</noframes>
</frameset>
</html>
I locate the button "Buscar" inside of frame "menu" with:
driver.switch_to_default_content()
driver.switch_to_frame(driver.find_element_by_css_selector("html frameset frameset#frame2 frame#menu"))
btn_buscar = driver.find_element_by_css_selector("#div_menu > table:nth-child(10) > tbody > tr > td:nth-child(2) > span > a")
btn_buscar.click()
I've tried this code to locate the input id="year" inside frame="Formulario":
driver.switch_to_default_content()
try: driver.switch_to_frame(driver.switch_to_frame(driver.find_element_by_css_selector("html frameset frameset#frame2 frame#contenido frameset#principal frame#Formulario")))
print("Ok cabecera -> contenido")
except:
print("cabecera not found")
or
driver.switch_to_frame(driver.switch_to_xpath("//*[#id='year"]"))
but they don't work.
Can you help me?
Thanks!
To be able to handle required iframe you need to switch subsequently to all
ancestor frames:
driver.switch_to.frame("cabecera")
driver.switch_to.frame("menu")
btn_buscar = driver.find_element_by_link_text("Buscar")
btn_buscar.click()
Also note that Webdriver instance has no such method as switch_to_xpath() and switch_to_frame(), switch_to_default_content() methods are deprecated so you'd better use switch_to.frame(), switch_to.default_content()
Assuming your program have the focus on Top Level Browsing Context, to locate and the button with text as Buscar you need to switch() through all the parent frames along with WebDriverWait in association with proper expected_conditions and you can use the following code block :
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(By.ID,"cabecera"))
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(By.ID,"menu"))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Buscar"))).click()
Related
I was planning on creating a basic web scraper for the site Sneakersnstuff.com however my efforts were stopped early due to an error. When requesting to the url https://www.sneakersnstuff.com/, rather than displaying the html of the website, or even the entrance captcha, I am redirected to a cloudflare page with the error message "enable cookies". Both my code and the response are shown below
import requests
import cfscrape
session = requests.session()
response = session.get('https://www.sneakersnstuff.com/')
print(response.headers)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en-US">
<!--<![endif]-->
<head>
<title>Access denied | www.sneakersnstuff.com used Cloudflare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css"
media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">
body {
margin: 0;
padding: 0
}
</style>
<!--[if gte IE 10]><!-->
<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
<!--<![endif]-->
<!--[if gte IE 10]><!-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
<!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please
enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1>
<span class="cf-error-type" data-translate="error">Error</span>
<span class="cf-error-code">1020</span>
<small class="heading-ray-id">Ray ID: 578133293d83e0d6 • 2020-03-22 16:13:25 UTC</small>
</h1>
<h2 class="cf-subheadline">Access denied</h2>
</div><!-- /.header -->
<section></section><!-- spacer -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">Cloudflare Ray ID: <strong>578133293d83e0d6</strong></span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Your IP</span>: 96.241.108.243</span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Performance & security by</span> <a
href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link"
target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body>
</html>
I have attempted using a library reccomend by many called cfscrape to no avail.
Adding Browser/User-Agent Filtering to cloudscraper did the trick for me.
import cloudscraper
from bs4 import BeautifulSoup
# Adding Browser / User-Agent Filtering should help ie.
# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})
html = scraper.get("https://www.sneakersnstuff.com/").content
soup = BeautifulSoup(html, 'html.parser')
print(soup)
import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper()
html = scraper.get("https://www.sneakersnstuff.com/").content
soup = BeautifulSoup(html, 'html.parser')
print(soup)
Output:
cloudscraper.exceptions.CloudflareReCaptchaProvider: Cloudflare reCaptcha detected, unfortunately you haven't loaded an anti reCaptcha provider correctly via the 'recaptcha' parameter.
Next Step ?
3rd Party reCaptcha Solvers
Description
cloudscraper currently supports the following 3rd party reCaptcha solvers, should you require them.
anticaptcha
deathbycaptcha
2captcha
9kw
return_response
I'm writing a Selenium scraper that waits until a page is loaded before trying to locate an element. When I run the script, it looks to me like the element has loaded in the browser window, but Selenium thinks otherwise.
Here's scraper.py:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("--ignore-certificate-errors")
options.add_argument("--test-type")
options.binary_location = "/usr/bin/chromium"
driver = webdriver.Chrome(chrome_options=options)
startingURL = "https://pbcvssp.co.palm-beach.fl.us/webapp/vssp/AltSelfService;jsessionid=0000lH_keoPs-fzs5sSkYGLah1X:-1"
driver.get(startingURL)
driver.find_element_by_name("guest_login").click()
driver.switch_to_window(driver.window_handles[1]) # Go to window with bids
try:
secondsToWait = 20
wait = WebDriverWait(driver,secondsToWait)
openBidsLinkName = "AMSBrowseOpenSolicit"
openBidsLink = wait.until(
EC.element_to_be_clickable(By.NAME,openBidsLinkName)
)
finally:
print driver.page_source
driver.find_element_by_name(openBidsLinkName)
But when I run python scraper.py, I get this error.
Traceback (most recent call last):
File "scraper.py", line 30, in <module>
driver.find_element_by_name(openBidsLinkName)
File "/home/me/ENV/pbc_vss/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 495, in find_element_by_name
return self.find_element(by=By.NAME, value=name)
File "/home/me/ENV/pbc_vss/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 966, in find_element
'value': value})['value']
File "/home/me/ENV/pbc_vss/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/home/me/ENV/pbc_vss/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"name","selector":"AMSBrowseOpenSolicit"}
Also driver.page_source looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><!-- BEGIN GENERATED HTML --><html xmlns="http://www.w3.org/1999/xhtml" lang="en" oncontextmenu="MNU_ShowPopup('Default', event);return false"><head>
<title>Self Service Application
</title>
<base href="https://pbcvssp.co.palm-beach.fl.us:443/webapp/vssp/advantage/AltSelfService/" />
<script language="JavaScript" type="text/javascript" src="../AMSJS/ALTSS/ALTSSUtil.js">
<!---->
</script>
<script language="JavaScript" type="text/javascript" src="../AMSJS/AMSMenu.js">
<!---->
</script>
<script language="JavaScript" type="text/javascript" src="../AMSJS/AMSDHTMLLib.js">
<!---->
</script>
<script language="JavaScript" type="text/javascript" src="../AMSJS/AMSUtils.js">
<!---->
</script>
<script type="text/javascript" language="JavaScript">
<!--
UTILS_InitPage();
-->
</script>
</head>
<frameset border="0" rows="33, *, 25">
<frame name="AdditionalLinks" src="/LoginExternal/Pages/LoginAdditionalLinks.htm" marginwidth="0" title="Additional Links" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#AdditionalLinks" scrolling="no" />
<frameset cols="150, *" border="0">
<frameset border="0" rows="150, *">
<frame name="pPrimaryNavPanel" src="pPrimaryNavPanel.htm" marginwidth="0" title="Navigation" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#Nav" scrolling="no" />
<frame name="Secondary" src="../AMSImages/ALTSS/portal.htm" marginwidth="0" title="Secondary Navigation" target="Display" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#SecondaryNavigator" scrolling="no" />
</frameset>
<frameset id="AltSSLinkFrame" border="0" rows="100, *">
<frame name="Startup" src="https://pbcvssp.co.palm-beach.fl.us/webapp/vssp/AltSelfService;jsessionid=0000CFpQkQ1YDSjZgm-4yMM0lHd:-1?session_id=CFpQkQ1YDSjZgm-4yMM0lHd&page_id=pid_2712&vsaction=pagetransition&vsnavigation=StartPageNav&frame_name=Startup" marginwidth="0" title="Welcome Area" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#PrimaryNav" scrolling="no" vsaction="true" />
<frame name="Display" src="AltSSHomePage.htm" marginwidth="0" title="Display Frame" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#Display" scrolling="auto" />
</frameset>
</frameset>
<frame name="CopyrightInfo" src="/LoginExternal/Pages/LoginCopyrightInfo.html" marginwidth="0" title="Copyright Info" frameborder="0" marginheight="0" longdesc="../AMSImages/ALTSS/SelfServiceFrameDesc.htm#CopyrightInfo" scrolling="no" />
</frameset>
<noframes>
<body>
<p>This page uses frames, but your browser does not support them. FramesetPage requires a Frames-capable browser</p>
</body>
</noframes>
How can I make Selenium locate the element with the name attribute "AMSBrowseOpenSolicit"?
Your table is in frame, so you have to switch to it before you can interact with this table. This code snippet will help you to do it:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# find frame and switch to it
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//frame[#title = 'Display Frame']")))
# do your stuff
secondsToWait = 20
wait = WebDriverWait(driver,secondsToWait)
openBidsLinkName = "AMSBrowseOpenSolicit"
openBidsLink = wait.until(
EC.element_to_be_clickable(By.NAME,openBidsLinkName)
)
driver.find_element_by_name(openBidsLinkName)
driver.switch_to.default_content() # switch back to default content
I am using Python to scrape data from a website. While I have been able to use Selenium to log in, I cannot identify the search field once logged in. It appears the web page loads with frames (not iframes), but I cannot access the frame with the search field.
I have tried changing the frame to the relevant frame (which seems to work - no error is thrown up) but then if I try searching for the search element by CSS / Xpath / Name / id I get a NoSuchElementException. I am using the Chrome webdriver.
Any suggestions? The page source is as follows:
<html>
<head>
<title> XYZ </title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Script-Type" content="text/javascript" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="content-language" content="en" />
<script type="text/javascript">
if (navigator && navigator.appVersion && navigator.appVersion.match("Safari") && !navigator.appVersion.match("Chrome")) {
// hack to force a window redraw
window.onload = function() {
document.getElementsByTagName('html')[0].style.backgroundColor = '#000000';
}
}
</script>
</head>
<frameset id="wc-frameset" rows="82,*" frameborder="no" border="0" framespacing="0">
<frame frameborder="0" src="/frontend/header/" name="top" marginwidth="0" marginheight="0" scrolling="no" noresize="noresize" />
<frameset cols="*,156,850,*" frameborder="NO" border="0" framespacing="0">
<frame frameborder="0" src="/frontend/fillbar/" name="fillbar" marginwidth="0" marginheight="0" scrolling="no" noresize="noresize" />
<frame frameborder="0" src="/frontend/navigation/" name="navigation" marginwidth="0" marginheight="0" scrolling="no" noresize="noresize" />
<frame frameborder="0" src="/frontend/frames/" name="content_area" marginwidth="0" marginheight="0" scrolling="no" noresize>
<frame frameborder="0" src="/frontend/fillbar/" name="fillbar" marginwidth="0" marginheight="0" scrolling="no" noresize="noresize" />
</frameset>
</frameset>
</html>
The code that I have so far is:
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")
username.send_keys("****")
password.send_keys("****")
driver.find_element_by_class_name("bg-left").click()
#this bit works
driver.switch_to_frame("content_area")
#this seems to work too, got the frame name from the page source
search = driver.find_element_by_id("field-name")
search.send_keys("TEST")
#this fails, no element found
The target frame source code is:
<div id="field-name" class="field field-StringField">
<label for="name">Name</label> <div class="input-con"><input id="name" name="name" type="text" value=""></div>
</div>
It is possible that there are duplicate elements in the page.
Try the following in chrome:
Open url in chrome
Open developer tools F12
Press ESC to open the chrome console
Select your frame
Search for similar elements using xpath in console
$x("//input[#id='name']")
This should list the number of elements.
Maybe you need to wait for the page to load up completely before continuing searching the element. You can try something like:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver.switch_to_frame("content_area")
try:
# this line adds wait for the element to be visible
WebDriverWait(driver, 10).until(EC.visibility_of_element_located(By.ID, 'name'))
except TimeoutException:
# display page timed out error
search = driver.find_element_by_id("name")
search.send_keys("TEST")
I am attempting to use selenium to navigate a website that is using frames.
Here is my working python script for part 1:
from selenium import webdriver
import time
from urllib import request
driver = webdriver.Firefox()
driver.get('http://www.lgs-hosted.com/rmtelldck.html')
driver.switch_to.frame('menu')
driver.execute_script('xSubmit()')
time.sleep(.5)
link = driver.find_element_by_id('ml1T2')
link.click()
Here is the page element:
<html webdriver="true">
<head></head>
<frameset id="menuframe" name="menuframe" border="0" frameborder="0" cols="170,*">
<frameset border="0" frameborder="0" rows="0,*">
<frame scrolling="AUTO" noresize="" frameborder="NO" src="heart.html" name="heart"></frame>
<frame scrolling="AUTO" noresize="" frameborder="NO" src="rmtelldcklogin.html" name="menu"></frame>
</frameset>
<frame scrolling="AUTO" noresize="" frameborder="NO" src="rmtelldcklogo.html" name="update"></frame>
</frameset>
</html>
My issue is switching the frames...its in 'menu' I need to get into 'update':
driver.switch_to.frame('update')
^ does not work....error tells me its not there, even though we can clearly see it is...any ideas?
How do I switch from menu to update?
You need to switch back to the default content before switching to a different frame:
driver.switch_to.default_content()
driver.switch_to.frame("update")
# to prove it is working
title = driver.find_element_by_id("L_DOCTITLE").text
print(title)
Prints:
Civil Case Inquiry
I am attempting to parse html data from a website using BeautifulSoup for python. However, urllib2 or mechanize is not able to read the whole html format. The returned data is
<html>
<head>
<title>
EC 4.1.2.13 - Fructose-bisphosphate aldolase </title>
<meta name="description" content="Information on EC 4.1.2.13 - Fructose-bisphosphate aldolase">
<meta name="keywords" content="EC,Number,Enzyme,Pathway,Reaction,Organism,Substrate,Cofactor,Inhibitor,Compound,KM Value,KI Value,IC50 Value,pi Value,Turnover Number,pH,Temperature,Optimum,Range,Source Tissue,BLAST,Subunits,Modification,Crystallization,Stability,Purification">
</head>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<frameset cols="190,*" border="0">
<frame name="navigation" src="flat_navigation.php4?ecno=4.1.2.13&organism_list=Mycobacterium tuberculosis&Suchword=&UniProtAcc=P67475" frameborder="no">
<frameset rows="110,*" border="0">
<frame name="header" src="flat_head.php4?ecno=4.1.2.13" frameborder="no">
<frame name="flat" src="flat_result.php4?ecno=4.1.2.13&organism_list=Mycobacterium tuberculosis&Suchword=&UniProtAcc=P67475" frameborder="no">
</frameset>
</frameset>
<noframes>
<body>
<h1>EC 4.1.2.13 - Fructose-bisphosphate aldolase </h1>
More detailed information on the enzyme EC 4.1.2.13 - Fructose-bisphosphate aldolase
Sorry, but your browser doesn't support frames. Please use another browser!
</body>
</noframes>
</html>
When I manually open the webste using Internet Explorer the whole html can be read. Is there anyway using urllib2, mechanize, or BeautifulSoup to work around this?
That's because the content is in the frames. You can either parse the page and look for the src attribute of the main <frame> element or directly request the frame. In most browsers, you can right-click and select "Frame Properties" or so to get the frame's URL.