Cant interact with iframe after switching to it Python/Selenium [duplicate] - python

This question already has answers here:
Ways to deal with #document under iframe
(2 answers)
Switch to an iframe through Selenium and python
(3 answers)
Closed 2 years ago.
I'm trying to interact with a button on a page with the following structure. The button of interest is within a div in the body of an iframe, which is inside the main body. I've already read all the stackoverflow questions about how to switch to an iframe - as you can see below I have no issue with that. What I have an issue with is that regardless of that advice I am unable to interact with the iframe I switched to.
<!doctype html>
<html ng-app="someApp" xmlns="http://www.w3.org/1999/html">
<head>
<script>a lot of scripts</script>
</head>
<body class="unwantedBody">
<iframe> some iframes</iframe>
<div> different divs </div>
<main>
some content
<iframe> multiple iframes on different nested levels </iframe>
</main>
<div> more divs </div>
<script> more scripts </script>
<div id='interesting div'>
<iframe src="uniqueString">
<!doctype html>
#document
<html>
<body>
<div>
<button id='bestButton'>
'Nice title'
</button>
</div>
</body>
</html>
</iframe>
</div>
</body>
</html>
Using Jupyter Notebook I've been able to locate the iframe and switch to it. The problem is not related to trying to interact with the iFrame too fast, because I control the speed. I've tried using Expected conditions and waiting until the iframe can be switched to, but it is irrelevant to my problem.
driver.switch_to.default_content # Making sure no other frame is selected
iframe = driver.find_element_by_xpath("//iframe[contains(#src, 'uniqueString')]")
driver.switch_to.frame(iframe)
print(iframe.get_attribute('id'))
The above code prints out "interesting div", so it successfully finds the div where the iframe is and apparently selects the div? Then I try to parse the iframe like this:
bestButton = driver.find_element_by_xpath("//button[#id = 'bestButton']")
bestButton.click()
This gives the error:
Message: element not interactable
I also tried to interact with the body within the iframe after switching to it with the above switch_to.frame(iframe), so in this example driver is already at the iframe:
document = driver.find_element_by_xpath('//html/body')
info = document.get_attribute('class')
print(info)
This prints out
unwantedBody
So somehow the driver has not switched to the iFrame I specified, and instead is still stuck on the main HTML. When loading the webpage on chrome I can find the button I want with just this xpath //button[contains(#id='bestButton')] but in Selenium it doesn't work, because it is split by the #document within the iframe.
What am I missing? If it helps, the iFrame I am interested in is actually a modal window about cookie consent, which I am trying to get rid of to interact with the site.

Related

How to prevent automatic HTML source code fixing on web browser

My original html source is below:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
As you can see there is a mistake in the title. There is an unclosed < between aaaaa and bbbbb.
When I open this page with web browsers (firefox, chrome and edge), the browsers fix the problem and change the source code to this:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
So is there a way to prevent browsers to fix problems in original htmls? When I browse, I want to see original html source.
Note: I am using firefox geckodriver with python/selenium. So any solution that includes a configuration in firefox or python code would be OK.
There are some fundamental difference between the HTML DOM shown through View Source i.e. using ctrl + U and the markup shown through Inspector i.e. using ctrl + shift + I.
Both the methods are two different browser features which allows users to look at the HTML of the webpage. However, the main difference is the View Source shows the HTML that was delivered from the web server (application server) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. Some of those activities may include:
HTML error correction by the browser
HTML normalization by the browser
DOM manipulation by Javascript
In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool. As an example:
With in View Source you may observe:
<h1>The title</h2>
Whereas through Inspect Element that would have corrected as:
<h1>The title</h1>
This usecase
Based on the above mentioned concept the following markup:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
gets corrected as:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>

Can't get data from inside of span-tag with beautifulsoup

I am trying to scrape Instagram page, and want to get/access div-tags present inside of span-tag. but I can't! the HTML of the Instagram page looks like as
<head>--</head>
<body>
<span id="react-root" aria-hidden="false">
<form enctype="multipart/form-data" method="POST" role="presentation">…</form>
<section class="_9eogI E3X2T">
<main class="SCxLW o64aR" role="main">
<div class="v9tJq VfzDr">
<header class=" HVbuG">…</header>
<div class="_4bSq7">…</div>
<div class="fx7hk">…</div>
</div>
</main>
</section>
</body>
I do, it as
from bs4 import BeautifulSoup
import urllib.request as urllib2
html_page = urllib2.urlopen("https://www.instagram.com/cherrified_/?hl=en")
soup = BeautifulSoup(html_page,"lxml")
span_tag = soup.find('span') # return span-tag correctly
span_tag.find_all('div') # return empty list, why ?
please also specify an example.
Instagram is a Single Page Application powered by React, which means its source is just a simple "empty" page that loads JavaScript to dynamically generate the content in the browser after downloading.
Click "View source" or go to view-source:https://www.instagram.com/cherrified_/?hl=en in Chrome. This is the HTML you download with urllib.request.
You can see that there is a single <span> tag, which does not include a <div> tag. (Note: <div> inside a <span> is not allowed).
Scraping instagram.com this way is not possible. It also might not be legal (I am not a lawyer).
Notes:
your HTML code example doesn't include a closing tag for <span>.
your HTML code example doesn't match the link you provide in the python snippet.
in the last line of the python snippet you probably meant span_tag.find_all('div') (note the variable name and the singular 'div').

Clicking button within span with no ID using Selenium

I'm trying to have click a button in the browser with Selenium and Python.
The button is within the following
<div id="generate">
<i class="fa fa-bolt"></i>
<span>Download Slides</span>
<div class="clear"></div>
</div>
Chrome's dev console tells me the button is within <span> but I have no idea how to reference the button for a .click().
Well, if you just want to click on an element without an id or name, I'd suggest three ways to do it:
use xpath:
driver.find_element_by_xpath('//*[#id="generate"]/span')
use CSS selector:
driver.find_element_by_css_selector('#generate > span')
Just try .find_element_by_tag_name() like:
driver.find_element_by_id('generate').find_elements_by_tag_name('span')[0]
Note that this way first try to get the generate <div> element by it's id, and then finds all the <span> elements under that <div>.
Finally, gets the first <span> element use [0].

Selenium click dynamic link

I have a page structure similar to this:
<html>
<head/>
<frameset>
<frame/>
<frameset id="id1">
<frame/>
<frame id="id2">
<html>
<head/>
<body class="class1">
<form id="id3">
<input/>
<input/>
<input/>
<input/>
<table/>
<table/>
<table/>
<div id="id4">
<div id="id5">
<table id="id6">
<thead/>
<tbody>
<tr/>
<tr/>
<tr/>
<tr>
<td/>
<td/>
<td>
Text
I need to click on the dynamic link - the link and position inside the table varies, but the text is always the same.
I've tried using find_element_by_link_text and it fails.
Using xpath it can not find the form element.
Thank you.
You need to switch to the frame containing the <a> element first. Your code would look something like this:
driver.switch_to_frame('id3')
driver.find_element_by_link_text('TEXT').click()
Note that the above code is only an approximation, since your provided HTML code is only an approximation. In particular, you have a <frameset> element as a direct child of another <frameset> element, which I believe is invalid HTML. If you indeed have nested framesets, you'll need multiple calls to switch_to_frame to navigate down the frame hierarchy until your focus is on the frame containing the document with the element you're looking for.
You can first find all a tags in the page using:
find_elements_by_tag_name
Then iterate over each a tag and check its text since text is always the same
a_tags = driver.find_elements_by_tag_name('a')
for a in a_tags:
if a.text == 'TEXT':
a.click()

selenium2 chrome webdriver - workaround for a blocking confirm?

Sorry my question could not be more succinct.
I'm using Selenium 2.14.0.
I have two pages, test.html and test2.html. I load test.html with the chrome webdriver, and click a link that takes me to test2.html.
test2.html contains a confirm in the body, which I think is preventing the page from loading, which is blocking my test script. Below are the html pages and my test script.
Is there anyway to have selenium2 close the confirm dialog? It looks like there is a way to do it with selenium-rc (choose_ok_on_next_confirmation), but that functionality is not available in webdriver (AFAICT).
When I run my test script, I don't get the 'blocking' output until I close the confirm. Funny thing is, if I load test2.html directly from my test script, instead of clicking a link, it doesn't seem to block.
Test.html
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Test2.html
<html>
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Python selenium test script
import selenium.webdriver as webdriver
wd = webdriver.Chrome()
wd.get('file:///C:/cygwin/tmp/postest/test.html')
elem = wd.find_element_by_id('link')
elem.click()
print 'Blocking!'
Have you tried
alert = driver.switch_to_alert()
alert.accept()
?

Categories