selenium2 chrome webdriver - workaround for a blocking confirm? - python

Sorry my question could not be more succinct.
I'm using Selenium 2.14.0.
I have two pages, test.html and test2.html. I load test.html with the chrome webdriver, and click a link that takes me to test2.html.
test2.html contains a confirm in the body, which I think is preventing the page from loading, which is blocking my test script. Below are the html pages and my test script.
Is there anyway to have selenium2 close the confirm dialog? It looks like there is a way to do it with selenium-rc (choose_ok_on_next_confirmation), but that functionality is not available in webdriver (AFAICT).
When I run my test script, I don't get the 'blocking' output until I close the confirm. Funny thing is, if I load test2.html directly from my test script, instead of clicking a link, it doesn't seem to block.
Test.html
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Test2.html
<html>
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Python selenium test script
import selenium.webdriver as webdriver
wd = webdriver.Chrome()
wd.get('file:///C:/cygwin/tmp/postest/test.html')
elem = wd.find_element_by_id('link')
elem.click()
print 'Blocking!'

Have you tried
alert = driver.switch_to_alert()
alert.accept()
?

Related

How can I use Python to receive input from HTML in my Chrome extension?

In my extension, I have an input box, and a button. What I would like is for text to be submitted, and when the button is clicked, a summary to be generated.
This is my Python and HTML code (it's very simple).
Python Code:
import gensim
from gensim.summarization import summarize
def summary(original_text):
return summarize(original_text)
HTML Code
<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<input placeholder="Paste your text here!"> </input>
<button type="button" id='button_summarize'> Summarize! </button>
</body>
</html>
I tried to use flask but I realised it would not work since I'm trying to create an extension. Is there any way I'd get the Python to communicate with my HTML? Sample code would be helpful as I'm a bit new to this (hopefully it isn't too cumbersome given the simplicity of the task).
Thanks!
Short answer:
no there isn't a way you should be using javascript.
Long answer (for die hard pythonists):
use brython or something similar.
you can add an event listener to the button in brython with
document['button_summarize'].bind('click',function)

How to prevent automatic HTML source code fixing on web browser

My original html source is below:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
As you can see there is a mistake in the title. There is an unclosed < between aaaaa and bbbbb.
When I open this page with web browsers (firefox, chrome and edge), the browsers fix the problem and change the source code to this:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
So is there a way to prevent browsers to fix problems in original htmls? When I browse, I want to see original html source.
Note: I am using firefox geckodriver with python/selenium. So any solution that includes a configuration in firefox or python code would be OK.
There are some fundamental difference between the HTML DOM shown through View Source i.e. using ctrl + U and the markup shown through Inspector i.e. using ctrl + shift + I.
Both the methods are two different browser features which allows users to look at the HTML of the webpage. However, the main difference is the View Source shows the HTML that was delivered from the web server (application server) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. Some of those activities may include:
HTML error correction by the browser
HTML normalization by the browser
DOM manipulation by Javascript
In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool. As an example:
With in View Source you may observe:
<h1>The title</h2>
Whereas through Inspect Element that would have corrected as:
<h1>The title</h1>
This usecase
Based on the above mentioned concept the following markup:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
gets corrected as:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>

Cant interact with iframe after switching to it Python/Selenium [duplicate]

This question already has answers here:
Ways to deal with #document under iframe
(2 answers)
Switch to an iframe through Selenium and python
(3 answers)
Closed 2 years ago.
I'm trying to interact with a button on a page with the following structure. The button of interest is within a div in the body of an iframe, which is inside the main body. I've already read all the stackoverflow questions about how to switch to an iframe - as you can see below I have no issue with that. What I have an issue with is that regardless of that advice I am unable to interact with the iframe I switched to.
<!doctype html>
<html ng-app="someApp" xmlns="http://www.w3.org/1999/html">
<head>
<script>a lot of scripts</script>
</head>
<body class="unwantedBody">
<iframe> some iframes</iframe>
<div> different divs </div>
<main>
some content
<iframe> multiple iframes on different nested levels </iframe>
</main>
<div> more divs </div>
<script> more scripts </script>
<div id='interesting div'>
<iframe src="uniqueString">
<!doctype html>
#document
<html>
<body>
<div>
<button id='bestButton'>
'Nice title'
</button>
</div>
</body>
</html>
</iframe>
</div>
</body>
</html>
Using Jupyter Notebook I've been able to locate the iframe and switch to it. The problem is not related to trying to interact with the iFrame too fast, because I control the speed. I've tried using Expected conditions and waiting until the iframe can be switched to, but it is irrelevant to my problem.
driver.switch_to.default_content # Making sure no other frame is selected
iframe = driver.find_element_by_xpath("//iframe[contains(#src, 'uniqueString')]")
driver.switch_to.frame(iframe)
print(iframe.get_attribute('id'))
The above code prints out "interesting div", so it successfully finds the div where the iframe is and apparently selects the div? Then I try to parse the iframe like this:
bestButton = driver.find_element_by_xpath("//button[#id = 'bestButton']")
bestButton.click()
This gives the error:
Message: element not interactable
I also tried to interact with the body within the iframe after switching to it with the above switch_to.frame(iframe), so in this example driver is already at the iframe:
document = driver.find_element_by_xpath('//html/body')
info = document.get_attribute('class')
print(info)
This prints out
unwantedBody
So somehow the driver has not switched to the iFrame I specified, and instead is still stuck on the main HTML. When loading the webpage on chrome I can find the button I want with just this xpath //button[contains(#id='bestButton')] but in Selenium it doesn't work, because it is split by the #document within the iframe.
What am I missing? If it helps, the iFrame I am interested in is actually a modal window about cookie consent, which I am trying to get rid of to interact with the site.

python web scraping. Site not showing only a few script tags. Have tried opening with selenium driver browser

I'm trying to pull the price from this site.
I tried with beautifulsoup first then opened page with selenium webdriver browser but got this response.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="shortcut icon" href="about:blank">
</head>
<body>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/j.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/f.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=9d98d39f-e497-2d15-7332-7e21738bd6e2"></script>
</body>
</html>
This is my python code.
from selenium import webdriver
dove_coles_url = "https://shop.coles.com.au/a/churchill-centre/product/dove-antiperspirant-deodorant-invisible-dry"
PATH = "C:\\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.delete_all_cookies()
driver.get(dove_coles_url)
thanks in advance.
Using your browser console, in the network tab, you can see this request being made :
https://shop.coles.com.au/search/resources/store/20509/productview/bySeoUrlKeyword/dove-antiperspirant-deodorant-invisible-dry?catalogId=17056
Opening it you'll see that it contains all the data for this product in json.

how to capture dynamically updated web element value(document.title) in python?

I am using bottle to host a simple html page which changes the page title on load.
The HTML page code:-
<html>
<head>
<title>title</title>
<script type="text/javascript">
function initialize(){
var z=1234;
document.title = z;}
</script>
</head>
<body onload="initialize();">
hi
</body>
</html>
My bottle hosting code:
from bottle import route, run, template
#route('/:anything')
def something(anything=''):
return template('C:/test1.html')
run(host='localhost', port=8080)
I am trying to capture the updated document.title using python.
so far I have tried urllib,mechanize,htmlparse but all of them were returning "title" instead of 1234.
a sample mechanize code that I have tried is:
from mechanize import Browser
br = Browser()
br.open("http://localhost:8080/hello")
print br.title()
Please help me.

Categories