selenium webdriver(chrome) elements differ from those of driver.page_source - python

I tried to scrape an article in the medium.
But it failed because selenium.webdriver.page_source doesn't contain the target div.
[E.G.] Demystifying Python Decorators in Less Than 10 Minutes https://medium.com/#adrianmarkperea/demystifying-python-decorators-in-10-minutes-ffe092723c6c
In this site, the content holder div's class is "x y z ab ac ez af ag", but this element doesn't show up in driver.page_source.
shortcode: below.
It is NOT the kind of timeout problem.
It seems like the drive.page_source is not processed with javascript, but I don't know.
ARTICLE = "https://medium.com/#adrianmarkperea/demystifying-python-decorators-in-10-minutes-ffe092723c6c"
driver.get(ARTICLE)
text_soup = BeautifulSoup(driver.page_source,"html5lib")
text = text_soup.select(".x.y.z.ab.ac.ez.af.ag")
print(text) # => []
I expect the output of driver.page_source is the same as that of the chrome developer console's elements.
Update: I did some experiment.
I doubted webdriver couldn't get the html source processedby javascript, so I "selenium-ed" the below html file.
But I got "element-removed" html file.
result:
webdriver and ordinary chrome console are same -> processed
<html lang="en">
<body>
<script type="text/javascript">
document.querySelector("#id").remove();
</script>
</body></html>
wget / requests -> not processed
<html lang="en">
<body>
<div id="id">
test element
</div>
<script type="text/javascript">
document.querySelector("#id").remove();
</script>
</body></html>

Related

Scraping Amazon deals page not returning html code - python

I am currently trying to scrape this Amazon page "https://www.amazon.com/b/?ie=UTF8&node=11552285011&ref_=sv_kstore_5" with the following code:
from bs4 import BeautifulSoup
import requests
url = 'https://www.amazon.com/b/?ie=UTF8&node=11552285011&ref_=sv_kstore_5'
r = requests.get(url)
soup = BeautifulSoup(r.content)
print(soup.prettify)
However when I run it instead of getting the simple html source code I get a bunch of lines which don't make much sense to me starting like this:
<bound method Tag.prettify of <!DOCTYPE html>
<html class="a-no-js" data-19ax5a9jf="dingo"><head><script>var aPageStart = (new Date()).getTime();</script><meta charset="utf-8"/><!-- emit CSM JS -->
<style>
[class*=scx-line-clamp-]{overflow:hidden}.scx-offscreen-truncate{position:relative;left:-1000000px}.scx-line-clamp-1{max-height:16.75px}.scx-truncate-medium.scx-line-clamp-1{max-height:20.34px}.scx-truncate-small.scx-line-clamp-1{max-height:13px}.scx-line-clamp-2{max-height:35.5px}.scx-truncate-medium.scx-line-clamp-2{max-height:41.67px}.scx-truncate-small.scx-line-clamp-2{max-height:28px}.scx-line-clamp-3{max-height:54.25px}.scx-truncate-medium.scx-line-clamp-3{max-height:63.01px}.scx-truncate-small.scx-line-clamp-3{max-height:43px}.scx-line-clamp-4{max-height:73px}.scx-truncate-medium.scx-line-clamp-4{max-height:84.34px}.scx-truncate-small.scx-line-clamp-4{max-height:58px}.scx-line-clamp-5{max-height:91.75px}.scx-truncate-medium.scx-line-clamp-5{max-height:105.68px}.scx-truncate-small.scx-line-clamp-5{max-height:73px}.scx-line-clamp-6{max-height:110.5px}.scx-truncate-medium.scx-line-clamp-6{max-height:127.01
And even when I scroll down, there is nothing that really resemble a structured html code with all the info I need. What am I doing wrong ? (I am a beginner so it could be anything really). Thank you very much!
print(soup.prettify)
intend to call soup.prettify.__repr__(). The output is
<bound method Tag.prettify of <!DOCTYPE html><html class="a-no-js" data-19ax5a9jf="dingo"><head>...
while you need to call the prettify method:
print(soup.prettify())
The output:
<html class="a-no-js" data-19ax5a9jf="dingo">
<head>
<script>
var aPageStart = (new Date()).getTime();
</script>
<meta charset="utf-8"/>
<!-- emit CSM JS -->
<style>
...

Seemingly "garbage" result with requests

I have this webpage. When I try to get its html using requests module like this :
import requests
link = "https://www.worldmarktheclub.com/resorts/7m/"
f = requests.get(link)
print(f.text)
I get a result like this:
<!DOCTYPE html>
<html><head>
<meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Expires" content="-1"/>
<meta http-equiv="CacheControl" content="no-cache"/>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<link rel="shortcut icon" href="data:;base64,iVBORw0KGgo="/>
<script>
(function(){
var securemsg;
var dosl7_common;
// seemingly garbage like [Z.li]+Z._j+Z.LO+Z.SJ+"(/.{"+Z.i+","+Z.Ii+"}
</script>
<script type="text/javascript" src="/TSPD/08e841a5c5ab20007f02433a700e2faba779c2e847ad5d441605ef3d4bbde75cd229bcdb30078f66?type=9"></script>
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>
Only a part of the result shown. But I can see the proper html when I inspect the webpage in a browser. I guess there might be an issue with the encoding of the page, but can't figure it out. Using urllib.request + read() gives the same wrong result. How do I correct this. Thanks in advance.
As suggested by #DeepSpace, the garbage issue in script is due to the minified JS code. But why am I not getting the html correctly?
What you deem as "garbage" is obfuscated/minified JS code that is written in <script> tags instead of in an external JS file.
If you look at the bottom of f.text, you will see <noscript>Please enable JavaScript to view the page content.</noscript>.
requests is not a browser, hence it can not execute JS code which this page is making use of, and the server will not allow user-agents who do not support JS to access it. Setting the User-Agent header to Chrome's (Chrome/60.0.3112.90) still does not work.
You will have to resort to other tools that allow JS execution, such as selenium.
The HTML code is produced on the fly by the Javascript code you see. Unfortunately, as said by #DeepSpace, requests does not execute Javascript.
As an alternative I suggest to use selenium. It is a library which simulate a browser and so execute Javascript.

selenium webdriver bot to extract bokeh images from local html

How to improve this script?
How to save more than one image per one run.
The html with 3 graphs loads instantly, but html with 50 graphs loads a few minutes. So it's not an optimal way to reload page for each image.
I'm getting only one image per one run. After that I get the error
Message:
stale element reference: element is not attached to the page document.
# encoding: utf-8
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
url = 'file:\\\\\\%s/26w0.html' % (os.getcwd())
driver.get(url)
elem = driver.find_element_by_class_name("bk-tool-icon-save")
saves = driver.find_elements_by_class_name("bk-tool-icon-save")
for i in range(len(saves)):
print i
driver.get(url)
elem = driver.find_element_by_class_name("bk-tool-icon-save")
saves = driver.find_elements_by_class_name("bk-tool-icon-save")
saves[i].click()
elem.send_keys(Keys.ENTER)
It's written on python, but I'm open to suggestions,
and if you know the solution on java/.net/any other platform/language, you are welcome.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Bokeh Plot</title>
<link rel="stylesheet" href="https://cdn.bokeh.org/bokeh/release/bokeh-0.12.4.min.css" type="text/css" />
<script type="text/javascript" src="https://cdn.bokeh.org/bokeh/release/bokeh-0.12.4.min.js"></script>
<script type="text/javascript">
Bokeh.set_log_level("info");
</script>
<style>
html {
width: 100%;
height: 100%;
}
body {
width: 90%;
height: 100%;
margin: auto;
}
</style>
</head>
<body>
<div class="bk-root">
<div class="bk-plotdiv" id="c5c2aae1-5936-42f6-b4e3-9bc4b46efd06"></div>
</div>
<script type="text/javascript">
(function() {
var fn = function() {
Bokeh.safely(function() {
var docs_json = {"2797a004-7f17-48fa-b80f-105be766d58d":{"roots":{"references":[{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"bb128c3f-68fb-4464-9d3f-c188b2ce8614","type":"Circle"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},"ticker":{"id":"ab09a83a-42c4-44e8-919d-30796ac60bfe","type":"BasicTicker"}},"id":"b2cbd3ba-7dbd-400e-9971-227edc77fed7","type":"Grid"},{"attributes":{"format":"%.1f ml"},"id":"de7cb3ad-08b7-45c5-9050-03c4b3621d74","type":"PrintfTickFormatter"},{"attributes":{"data_source":{"id":"83b66900-4cd4-4d1c-8f66-60c8665f23e6","type":"ColumnDataSource"},"glyph":{"id":"5e858549-3ff5-41e0-b90c-d3a6976a3686","type":"Line"},"hover_glyph":null,"nonselection_glyph":{"id":"e6565ecc-8d83-440f-af86-88b488066471","type":"Line"},"selection_glyph":null},"id":"d7941470-1f87-439f-a797-accc78f7cc69","type":"GlyphRenderer"},{"attributes":{"label":{"value":"75x^4 - 542.17x\u00b3 + 396.6x\u00b2 + 131.48x + 2.0519"},"renderers":[{"id":"9b3e35b4-210e-4989-a5c3-8015c3ae34bc","type":"GlyphRenderer"}]},"id":"6f8cc559-bea7-4209-99c5-7037deead29c","type":"LegendItem"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[0.5,1,2,4,6],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"32b7f2b8-c72e-4caf-982d-49753e787111","type":"ColumnDataSource"},{"attributes":{"format":"%.1f ml"},"id":"e4ec96e3-88cc-4a38-9c98-013e60302040","type":"PrintfTickFormatter"},{"attributes":{"axis_label":"OD=450nm","axis_label_text_font_style":"bold","formatter":{"id":"de7cb3ad-08b7-45c5-9050-03c4b3621d74","type":"PrintfTickFormatter"},"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"},"ticker":{"id":"8ab7c0bd-3ec6-4ba1-992b-9b1d81e19180","type":"BasicTicker"}},"id":"bf5574c8-01bd-4feb-9dc1-f2d43e3ff684","type":"LinearAxis"},{"attributes":{"fill_color":{"value":"white"},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"eda96e22-e71f-4452-a30f-b9a4b8718831","type":"Circle"},{"attributes":{"dimension":1,"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},"ticker":{"id":"2e431824-d01c-4cce-abac-39320d6671b2","type":"BasicTicker"}},"id":"291ccc16-e4c1-4f7e-bae5-af652e9fd098","type":"Grid"},{"attributes":{},"id":"e4c1cac9-ed55-4ac3-8af2-e2390e64ddaa","type":"BasicTicker"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[10,20,40,80,160],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"b609e524-f77b-493f-8713-715846980d9c","type":"ColumnDataSource"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"351ccde3-723b-445d-8765-6cdcede88d0d","type":"SaveTool"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"521e13fb-9301-4c4f-9622-2a15f6666f88","type":"BoxAnnotation"},{"attributes":{"axis_label":"95735 Human Apoptosense M30, pg/ml","axis_label_text_font_style":"bold","formatter":{"id":"30d8adbc-1bbd-49f9-9056-daf8cbc2920b","type":"PrintfTickFormatter"},"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"},"ticker":{"id":"edb441a1-4fdb-441f-82d7-ef5c9bf89a7b","type":"BasicTicker"}},"id":"7f6c26c5-6c0f-4b82-b54e-6ae035a42551","type":"LinearAxis"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},"ticker":{"id":"59097f65-8f85-4e09-a80b-af85f92666ea","type":"BasicTicker"}},"id":"43d8cc65-3140-400a-adbf-301b93cd21fb","type":"Grid"},{"attributes":{"children":[{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}]},"id":"fa51df2d-c281-4559-9fa2-05403688a06a","type":"Column"},{"attributes":{"bounds":[0,null],"callback":null,"end":2.5},"id":"4a564014-2efa-4bc1-9fc9-66d4a722c4da","type":"Range1d"},{"attributes":{"callback":null},"id":"5e810fb6-8963-47a2-b1ef-4cbf2976a259","type":"DataRange1d"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[0.5,1,2,4,6],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"83b66900-4cd4-4d1c-8f66-60c8665f23e6","type":"ColumnDataSource"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"a0a4a475-1273-44b6-b312-2e9e7a3db071","type":"WheelZoomTool"},{"attributes":{"bounds":[0,null],"callback":null,"end":2.5},"id":"be0121bb-c66c-4f58-bcc6-60dee73e3906","type":"Range1d"},{"attributes":{"format":"%d pmol"},"id":"e2f15250-4359-477a-993a-bace1c43204e","type":"PrintfTickFormatter"},{"attributes":{"items":[{"id":"6f8cc559-bea7-4209-99c5-7037deead29c","type":"LegendItem"}],"label_text_font_style":"bold","location":"top_left","plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"64422ead-b565-4cf7-ad62-71be497e3ad3","type":"Legend"},{"attributes":{"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"6951ff93-78b0-4410-908a-b6235b52c745","type":"Line"},{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"801ad24a-8d5e-4a6d-bce2-c12c6cc76c28","type":"Circle"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[9.375,18.75,37.5,75,150],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"fb013b9d-e02a-4dcf-8a39-53fd28e06b0a","type":"ColumnDataSource"},{"attributes":{"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"5c016a30-4628-476d-8271-db0cf0bb814f","type":"Line"},{"attributes":{"axis_label":"95734 Human Free carnitine(F-C)\n, pg/ml","axis_label_text_font_style":"bold","formatter":{"id":"7611152e-8efd-48a7-9d70-dd30e237ab41","type":"PrintfTickFormatter"},"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},"ticker":{"id":"ab09a83a-42c4-44e8-919d-30796ac60bfe","type":"BasicTicker"}},"id":"ec75fa99-4693-4ca9-91fa-dd33ed3e20ef","type":"LinearAxis"},{"attributes":{"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"e6565ecc-8d83-440f-af86-88b488066471","type":"Line"},{"attributes":{},"id":"ab09a83a-42c4-44e8-919d-30796ac60bfe","type":"BasicTicker"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"fb5572e9-b038-4e5b-8295-3ad612a31885","type":"ResetTool"},{"attributes":{},"id":"e22b7b7a-2d18-4526-80fa-e3001352a319","type":"ToolEvents"},{"attributes":{"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"99213003-bdf1-4278-b202-457d8f4b2e7f","type":"Line"},{"attributes":{"label":{"value":"75x^4 - 542.17x\u00b3 + 396.6x\u00b2 + 131.48x + 2.0519"},"renderers":[{"id":"d7941470-1f87-439f-a797-accc78f7cc69","type":"GlyphRenderer"}]},"id":"2accad8a-abf2-468b-8b6a-2e03bc675907","type":"LegendItem"},{"attributes":{"dimension":1,"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},"ticker":{"id":"e4c1cac9-ed55-4ac3-8af2-e2390e64ddaa","type":"BasicTicker"}},"id":"8e631177-2a67-4aab-8e2b-5b826bf4208b","type":"Grid"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"3a0a1eb4-3e6d-46b8-ab89-cf8f4f1cbe2a","type":"PanTool"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"91dcbc0c-4c67-43e6-83f0-734c4e51c422","type":"WheelZoomTool"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"0d69a258-4b25-4097-b2fc-4c8ccee39867","type":"HelpTool"},{"attributes":{"bounds":[0,null],"callback":null,"end":2.5},"id":"f2efd5a3-a4d7-45b3-9dc1-7bda5ec259f5","type":"Range1d"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"},"ticker":{"id":"edb441a1-4fdb-441f-82d7-ef5c9bf89a7b","type":"BasicTicker"}},"id":"3df0545d-95fb-4a61-96d8-044988c435ee","type":"Grid"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[9.375,18.75,37.5,75,150],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"43064e70-6d90-4e15-a701-691886d6c98d","type":"ColumnDataSource"},{"attributes":{"data_source":{"id":"f80ed589-aa96-4e2b-b357-512934b32548","type":"ColumnDataSource"},"glyph":{"id":"54443d79-4bda-436a-b85c-b327adce5a92","type":"Circle"},"hover_glyph":null,"nonselection_glyph":{"id":"801ad24a-8d5e-4a6d-bce2-c12c6cc76c28","type":"Circle"},"selection_glyph":null},"id":"630f3983-2a3f-4aaa-bfca-49954efc0c25","type":"GlyphRenderer"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"3ab3dce1-6b3d-49d9-b1ea-4081b39446f5","type":"HelpTool"},{"attributes":{"overlay":{"id":"db8c5b58-879a-4a7f-bfe3-e9dd3b9c7d1d","type":"BoxAnnotation"},"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"93e96ad3-8068-48a1-9f6f-2236e0ede821","type":"BoxZoomTool"},{"attributes":{"items":[{"id":"e52ec974-f942-44a5-bc88-eee9aaeb26ad","type":"LegendItem"}],"label_text_font_style":"bold","location":"top_left","plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"29a55b43-6628-4c9b-88f2-5241cca6911f","type":"Legend"},{"attributes":{"data_source":{"id":"b609e524-f77b-493f-8713-715846980d9c","type":"ColumnDataSource"},"glyph":{"id":"8473f88f-cf74-4f31-b31a-d2c6db34374a","type":"Line"},"hover_glyph":null,"nonselection_glyph":{"id":"5c016a30-4628-476d-8271-db0cf0bb814f","type":"Line"},"selection_glyph":null},"id":"58aa88a4-8ff8-4f52-8a36-4ae90b87d9fb","type":"GlyphRenderer"},{"attributes":{},"id":"59097f65-8f85-4e09-a80b-af85f92666ea","type":"BasicTicker"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"b6871fce-fdc4-4662-bb5d-85e9b7a4ad1e","type":"ResetTool"},{"attributes":{"fill_alpha":{"value":0.1},"fill_color":{"value":"#1f77b4"},"line_alpha":{"value":0.1},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"23334a4d-0a36-4888-a59f-a4a6250bd69c","type":"Circle"},{"attributes":{"callback":null},"id":"5b8d0607-7472-4e11-ae18-7254bfc50db3","type":"DataRange1d"},{"attributes":{},"id":"2e431824-d01c-4cce-abac-39320d6671b2","type":"BasicTicker"},{"attributes":{"axis_label":"95724 Human copeptin\n, pmol/L","axis_label_text_font_style":"bold","formatter":{"id":"e2f15250-4359-477a-993a-bace1c43204e","type":"PrintfTickFormatter"},"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},"ticker":{"id":"59097f65-8f85-4e09-a80b-af85f92666ea","type":"BasicTicker"}},"id":"ca22b639-f183-4932-9035-6575f43477d4","type":"LinearAxis"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"5d058e2a-1b8a-4886-bc0e-40a209e5c190","type":"SaveTool"},{"attributes":{"items":[{"id":"2accad8a-abf2-468b-8b6a-2e03bc675907","type":"LegendItem"}],"label_text_font_style":"bold","location":"top_left","plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"ba8c08f8-eb6a-430e-a73c-4293329538cc","type":"Legend"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"77ca8c4c-cdfb-47af-a837-7c76e0338f7e","type":"WheelZoomTool"},{"attributes":{},"id":"edb441a1-4fdb-441f-82d7-ef5c9bf89a7b","type":"BasicTicker"},{"attributes":{"callback":null,"column_names":["x","y"],"data":{"x":[10,20,40,80,160],"y":[0.183,0.436,0.771,1.453,2.177]}},"id":"f80ed589-aa96-4e2b-b357-512934b32548","type":"ColumnDataSource"},{"attributes":{"background_fill_alpha":{"value":0.8},"below":[{"id":"ca22b639-f183-4932-9035-6575f43477d4","type":"LinearAxis"}],"left":[{"id":"6862c7c5-cb85-4243-9c7b-9a49642710f8","type":"LinearAxis"}],"plot_width":900,"renderers":[{"id":"ca22b639-f183-4932-9035-6575f43477d4","type":"LinearAxis"},{"id":"43d8cc65-3140-400a-adbf-301b93cd21fb","type":"Grid"},{"id":"6862c7c5-cb85-4243-9c7b-9a49642710f8","type":"LinearAxis"},{"id":"8e631177-2a67-4aab-8e2b-5b826bf4208b","type":"Grid"},{"id":"521e13fb-9301-4c4f-9622-2a15f6666f88","type":"BoxAnnotation"},{"id":"ba8c08f8-eb6a-430e-a73c-4293329538cc","type":"Legend"},{"id":"d7941470-1f87-439f-a797-accc78f7cc69","type":"GlyphRenderer"},{"id":"038d2c03-db1f-4246-969a-2f003bc58214","type":"GlyphRenderer"}],"title":{"id":"f70da64c-6a60-4c81-ac9c-486574b9c4ed","type":"Title"},"tool_events":{"id":"e22b7b7a-2d18-4526-80fa-e3001352a319","type":"ToolEvents"},"toolbar":{"id":"f75e2cb9-f423-4383-a41d-638ff4a5a417","type":"Toolbar"},"x_range":{"id":"29683b67-b091-4a62-ab1f-d40b1289e9f4","type":"DataRange1d"},"y_range":{"id":"f2efd5a3-a4d7-45b3-9dc1-7bda5ec259f5","type":"Range1d"}},"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"db8c5b58-879a-4a7f-bfe3-e9dd3b9c7d1d","type":"BoxAnnotation"},{"attributes":{"plot":null,"text":""},"id":"ba81564b-0c23-4a9e-826e-e43201b0e449","type":"Title"},{"attributes":{"fill_color":{"value":"white"},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"54443d79-4bda-436a-b85c-b327adce5a92","type":"Circle"},{"attributes":{"fill_color":{"value":"white"},"line_color":{"value":"#1f77b4"},"size":{"units":"screen","value":8},"x":{"field":"x"},"y":{"field":"y"}},"id":"a8506a17-fef3-45ea-8118-749941f73322","type":"Circle"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"35150a0a-6a13-4fb1-9d65-680844b108fd","type":"HelpTool"},{"attributes":{"format":"%d pg"},"id":"30d8adbc-1bbd-49f9-9056-daf8cbc2920b","type":"PrintfTickFormatter"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"ed65518e-1887-4053-bb39-3d162c719899","type":"ResetTool"},{"attributes":{"label":{"value":"75x^4 - 542.17x\u00b3 + 396.6x\u00b2 + 131.48x + 2.0519"},"renderers":[{"id":"58aa88a4-8ff8-4f52-8a36-4ae90b87d9fb","type":"GlyphRenderer"}]},"id":"e52ec974-f942-44a5-bc88-eee9aaeb26ad","type":"LegendItem"},{"attributes":{"background_fill_alpha":{"value":0.8},"below":[{"id":"ec75fa99-4693-4ca9-91fa-dd33ed3e20ef","type":"LinearAxis"}],"left":[{"id":"2c445319-62e1-42a3-b270-59425c3c4100","type":"LinearAxis"}],"plot_width":900,"renderers":[{"id":"ec75fa99-4693-4ca9-91fa-dd33ed3e20ef","type":"LinearAxis"},{"id":"b2cbd3ba-7dbd-400e-9971-227edc77fed7","type":"Grid"},{"id":"2c445319-62e1-42a3-b270-59425c3c4100","type":"LinearAxis"},{"id":"291ccc16-e4c1-4f7e-bae5-af652e9fd098","type":"Grid"},{"id":"db8c5b58-879a-4a7f-bfe3-e9dd3b9c7d1d","type":"BoxAnnotation"},{"id":"64422ead-b565-4cf7-ad62-71be497e3ad3","type":"Legend"},{"id":"9b3e35b4-210e-4989-a5c3-8015c3ae34bc","type":"GlyphRenderer"},{"id":"3771367d-7eac-4366-82eb-2a6ae0581aad","type":"GlyphRenderer"}],"title":{"id":"e437cbca-8f2b-4cd4-83d8-b2c020726684","type":"Title"},"tool_events":{"id":"42e06622-2cec-43bc-8b0b-37606b4e2ed0","type":"ToolEvents"},"toolbar":{"id":"7c537b7a-60fc-4ed7-9f37-2e26d7f7ef84","type":"Toolbar"},"x_range":{"id":"5e810fb6-8963-47a2-b1ef-4cbf2976a259","type":"DataRange1d"},"y_range":{"id":"4a564014-2efa-4bc1-9fc9-66d4a722c4da","type":"Range1d"}},"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},{"attributes":{"axis_label":"OD=450nm","axis_label_text_font_style":"bold","formatter":{"id":"e4ec96e3-88cc-4a38-9c98-013e60302040","type":"PrintfTickFormatter"},"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"},"ticker":{"id":"2e431824-d01c-4cce-abac-39320d6671b2","type":"BasicTicker"}},"id":"2c445319-62e1-42a3-b270-59425c3c4100","type":"LinearAxis"},{"attributes":{"overlay":{"id":"65d909ea-fabf-4424-9b9c-08853c798990","type":"BoxAnnotation"},"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"fdd8248a-9231-4b1b-96b2-50a93cc377d6","type":"BoxZoomTool"},{"attributes":{"overlay":{"id":"521e13fb-9301-4c4f-9622-2a15f6666f88","type":"BoxAnnotation"},"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"d311ebd6-2b3c-407f-ae8d-921f407eb08d","type":"BoxZoomTool"},{"attributes":{"data_source":{"id":"43064e70-6d90-4e15-a701-691886d6c98d","type":"ColumnDataSource"},"glyph":{"id":"6951ff93-78b0-4410-908a-b6235b52c745","type":"Line"},"hover_glyph":null,"nonselection_glyph":{"id":"99213003-bdf1-4278-b202-457d8f4b2e7f","type":"Line"},"selection_glyph":null},"id":"9b3e35b4-210e-4989-a5c3-8015c3ae34bc","type":"GlyphRenderer"},{"attributes":{"callback":null},"id":"29683b67-b091-4a62-ab1f-d40b1289e9f4","type":"DataRange1d"},{"attributes":{"plot":{"id":"64a90e8d-fce9-4bbd-b044-1ceda097ba2d","subtype":"Figure","type":"Plot"}},"id":"4091c142-49a2-4042-9cbd-705b4fba8fb3","type":"SaveTool"},{"attributes":{"axis_label":"OD=450nm","axis_label_text_font_style":"bold","formatter":{"id":"210d7bd4-012f-4d10-8822-2e1e3c576cad","type":"PrintfTickFormatter"},"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"},"ticker":{"id":"e4c1cac9-ed55-4ac3-8af2-e2390e64ddaa","type":"BasicTicker"}},"id":"6862c7c5-cb85-4243-9c7b-9a49642710f8","type":"LinearAxis"},{"attributes":{"bottom_units":"screen","fill_alpha":{"value":0.5},"fill_color":{"value":"lightgrey"},"left_units":"screen","level":"overlay","line_alpha":{"value":1.0},"line_color":{"value":"black"},"line_dash":[4,4],"line_width":{"value":2},"plot":null,"render_mode":"css","right_units":"screen","top_units":"screen"},"id":"65d909ea-fabf-4424-9b9c-08853c798990","type":"BoxAnnotation"},{"attributes":{"format":"%.1f L"},"id":"210d7bd4-012f-4d10-8822-2e1e3c576cad","type":"PrintfTickFormatter"},{"attributes":{},"id":"9c1cc749-78d3-499e-8166-9d0b4ca3c80e","type":"ToolEvents"},{"attributes":{"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"8473f88f-cf74-4f31-b31a-d2c6db34374a","type":"Line"},{"attributes":{"plot":null,"text":""},"id":"e437cbca-8f2b-4cd4-83d8-b2c020726684","type":"Title"},{"attributes":{"plot":{"id":"a51e3dcc-2662-4735-b1ca-48832a7264b6","subtype":"Figure","type":"Plot"}},"id":"9f06704d-ad5c-45c2-aa0d-08c4327129aa","type":"PanTool"},{"attributes":{"active_drag":"auto","active_scroll":"auto","active_tap":"auto","logo":null,"tools":[{"id":"3a0a1eb4-3e6d-46b8-ab89-cf8f4f1cbe2a","type":"PanTool"},{"id":"91dcbc0c-4c67-43e6-83f0-734c4e51c422","type":"WheelZoomTool"},{"id":"93e96ad3-8068-48a1-9f6f-2236e0ede821","type":"BoxZoomTool"},{"id":"4091c142-49a2-4042-9cbd-705b4fba8fb3","type":"SaveTool"},{"id":"ed65518e-1887-4053-bb39-3d162c719899","type":"ResetTool"},{"id":"0d69a258-4b25-4097-b2fc-4c8ccee39867","type":"HelpTool"}]},"id":"7c537b7a-60fc-4ed7-9f37-2e26d7f7ef84","type":"Toolbar"},{"attributes":{"data_source":{"id":"32b7f2b8-c72e-4caf-982d-49753e787111","type":"ColumnDataSource"},"glyph":{"id":"eda96e22-e71f-4452-a30f-b9a4b8718831","type":"Circle"},"hover_glyph":null,"nonselection_glyph":{"id":"bb128c3f-68fb-4464-9d3f-c188b2ce8614","type":"Circle"},"selection_glyph":null},"id":"038d2c03-db1f-4246-969a-2f003bc58214","type":"GlyphRenderer"},{"attributes":{"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"}},"id":"9273eb57-9cd5-47ec-b61a-391ca387cf96","type":"PanTool"},{"attributes":{"background_fill_alpha":{"value":0.8},"below":[{"id":"7f6c26c5-6c0f-4b82-b54e-6ae035a42551","type":"LinearAxis"}],"left":[{"id":"bf5574c8-01bd-4feb-9dc1-f2d43e3ff684","type":"LinearAxis"}],"plot_width":900,"renderers":[{"id":"7f6c26c5-6c0f-4b82-b54e-6ae035a42551","type":"LinearAxis"},{"id":"3df0545d-95fb-4a61-96d8-044988c435ee","type":"Grid"},{"id":"bf5574c8-01bd-4feb-9dc1-f2d43e3ff684","type":"LinearAxis"},{"id":"03fd8f50-9f84-414a-921e-f9215101a4d3","type":"Grid"},{"id":"65d909ea-fabf-4424-9b9c-08853c798990","type":"BoxAnnotation"},{"id":"29a55b43-6628-4c9b-88f2-5241cca6911f","type":"Legend"},{"id":"58aa88a4-8ff8-4f52-8a36-4ae90b87d9fb","type":"GlyphRenderer"},{"id":"630f3983-2a3f-4aaa-bfca-49954efc0c25","type":"GlyphRenderer"}],"title":{"id":"ba81564b-0c23-4a9e-826e-e43201b0e449","type":"Title"},"tool_events":{"id":"9c1cc749-78d3-499e-8166-9d0b4ca3c80e","type":"ToolEvents"},"toolbar":{"id":"d81630f8-b9ed-4cce-907b-0141d2e05635","type":"Toolbar"},"x_range":{"id":"5b8d0607-7472-4e11-ae18-7254bfc50db3","type":"DataRange1d"},"y_range":{"id":"be0121bb-c66c-4f58-bcc6-60dee73e3906","type":"Range1d"}},"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"42e06622-2cec-43bc-8b0b-37606b4e2ed0","type":"ToolEvents"},{"attributes":{"plot":null,"text":""},"id":"f70da64c-6a60-4c81-ac9c-486574b9c4ed","type":"Title"},{"attributes":{"data_source":{"id":"fb013b9d-e02a-4dcf-8a39-53fd28e06b0a","type":"ColumnDataSource"},"glyph":{"id":"a8506a17-fef3-45ea-8118-749941f73322","type":"Circle"},"hover_glyph":null,"nonselection_glyph":{"id":"23334a4d-0a36-4888-a59f-a4a6250bd69c","type":"Circle"},"selection_glyph":null},"id":"3771367d-7eac-4366-82eb-2a6ae0581aad","type":"GlyphRenderer"},{"attributes":{"line_color":{"value":"#1f77b4"},"line_width":{"value":4},"x":{"field":"x"},"y":{"field":"y"}},"id":"5e858549-3ff5-41e0-b90c-d3a6976a3686","type":"Line"},{"attributes":{"format":"%d pg"},"id":"7611152e-8efd-48a7-9d70-dd30e237ab41","type":"PrintfTickFormatter"},{"attributes":{"dimension":1,"plot":{"id":"e44c3f42-efae-400d-a997-de22cc1d9be2","subtype":"Figure","type":"Plot"},"ticker":{"id":"8ab7c0bd-3ec6-4ba1-992b-9b1d81e19180","type":"BasicTicker"}},"id":"03fd8f50-9f84-414a-921e-f9215101a4d3","type":"Grid"},{"attributes":{},"id":"8ab7c0bd-3ec6-4ba1-992b-9b1d81e19180","type":"BasicTicker"},{"attributes":{"active_drag":"auto","active_scroll":"auto","active_tap":"auto","logo":null,"tools":[{"id":"9273eb57-9cd5-47ec-b61a-391ca387cf96","type":"PanTool"},{"id":"77ca8c4c-cdfb-47af-a837-7c76e0338f7e","type":"WheelZoomTool"},{"id":"fdd8248a-9231-4b1b-96b2-50a93cc377d6","type":"BoxZoomTool"},{"id":"5d058e2a-1b8a-4886-bc0e-40a209e5c190","type":"SaveTool"},{"id":"b6871fce-fdc4-4662-bb5d-85e9b7a4ad1e","type":"ResetTool"},{"id":"3ab3dce1-6b3d-49d9-b1ea-4081b39446f5","type":"HelpTool"}]},"id":"d81630f8-b9ed-4cce-907b-0141d2e05635","type":"Toolbar"},{"attributes":{"active_drag":"auto","active_scroll":"auto","active_tap":"auto","logo":null,"tools":[{"id":"9f06704d-ad5c-45c2-aa0d-08c4327129aa","type":"PanTool"},{"id":"a0a4a475-1273-44b6-b312-2e9e7a3db071","type":"WheelZoomTool"},{"id":"d311ebd6-2b3c-407f-ae8d-921f407eb08d","type":"BoxZoomTool"},{"id":"351ccde3-723b-445d-8765-6cdcede88d0d","type":"SaveTool"},{"id":"fb5572e9-b038-4e5b-8295-3ad612a31885","type":"ResetTool"},{"id":"35150a0a-6a13-4fb1-9d65-680844b108fd","type":"HelpTool"}]},"id":"f75e2cb9-f423-4383-a41d-638ff4a5a417","type":"Toolbar"}],"root_ids":["fa51df2d-c281-4559-9fa2-05403688a06a"]},"title":"Bokeh Application","version":"0.12.4"}};
var render_items = [{"docid":"2797a004-7f17-48fa-b80f-105be766d58d","elementid":"c5c2aae1-5936-42f6-b4e3-9bc4b46efd06","modelid":"fa51df2d-c281-4559-9fa2-05403688a06a"}];
Bokeh.embed.embed_items(docs_json, render_items);
});
};
if (document.readyState != "loading") fn();
else document.addEventListener("DOMContentLoaded", fn);
})();
</script>
</body>
</html>
I can't do it manually for a long time,
because needed a thousands of images.
p.s.
to #e1che
I've fixed some things in our code:
for i in range(len(saves)):
print i
saves[i].click()
saves[i].send_keys(Keys.ENTER)
looks simpler and better, but also don't work correctly.
Your main problem comes from you gives a "new" url to your driver.
You should simplify a bit your foreach loop.
for i in range(len(saves)):
print i
elem_loop = elem[i]
saves_loop = saves[i]
saves_loop[i].click()
elem_loop.send_keys(Keys.ENTER) // elem_loop[i] ? you click on the same element for each image ?
I don't see the 50 images that you refer to in the question but the main issue you are having is that you are reloading the page before clicking all the save buttons on the page. You can fix that with the code below.
// files is an array of string which is the path to the files on disk
for file in files
driver.get(file)
for e in driver.find_elements_by_class_name("bk-tool-icon-save"):
e.click()
This should save off all the loaded images. Once you provide more details on how/when the other 50 images are loaded, we can answer that portion. You should just wait until all the images are loaded and the code above will work and save all images.
Also, elem is just the first element from the saves collection so there's no need to store both. I'm not sure the purpose of the send_keys()... it shouldn't be needed so I removed it also.

Getting html elements of another page with window.open()

I'm trying to access html elements from an html page I access with window.open().
These are my htmls:
firstpage.html:
<html>
<body>
<script>
function openPage() {
return window.open("secondpage.html")
}
</script>
</body>
</html>
secondpage.html
<html>
<body>
<h1>Hello!</h2>
<p>This is the second page</p>
</body>
</html>
This is what I'm doing:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("firstpage.html")
html = browser.execute_script("return openPage().document;")
print html
What I'm expecting to get is a reference to the document element of the second page. This seems to work in the Firefox web console. When I test the script, the second page opens, but the first page seems to hang and after a while I get a dialog saying:
"A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete."
With the "Stop" and "Continue" buttons. Pressing "Continue" the dialog keeps to appear, when I eventually press "Stop", the html python variable contains the same text of the dialog.
What am I doing wrong?
EDIT:
As in the #e1che answer, this is the right way to do it:
firstpage.html:
<html>
<body>
<script>
function openPage() {
window.open("secondpage.html", "secondpagewindow")
}
</script>
</body>
</html>
The python code:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("firstpage.html")
browser.execute_script("openPage()")
browser.switch_to_window("secondpagewindow")
print browser.page_source
You're doing well,
but you miss to switch your driver to the new window.
so right after you're browser.execute_script("return openPage().document;")
You write something like :
driver.switch_to_window("windowName")
I let you search here for more infos and tricks ;)

Only Firefox displays HTML Code and not the page

I have this complicated problem that I can't find a answer to.
I have a Python HTTPServer running that serves webpages. These webpages are created at runtime with help of Beautiful Soup. Problem is that the Firefox shows HTML Code for the webpage and not the actual page? I really don't know know who is causing this problem -
- Python HTTPServer
- Beautiful Soup
- HTML Code
Any case, I have copied parts of the webpage HTML:-
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>
My title
</title>
<link href="style.css" rel="stylesheet" type="text/css" />
<script src="./123_ui.js">
</script>
</head>
<body>
<div>
Hellos
</div>
</body>
</html>
Just to help you, here are the things that I have already tried-
- I have made sure that Python HTTPServer is sending the MIME header as text/html
- Just copying and pasting the HTML Code will show you correct page as its static. I can tell from here that the problem is in HTTPServer side
- The Firebug shows that is empty and "This element has no style rules. You can create a rule for it." is displayed
I just want to know if the error is in Beautiful Soup or HTTPServer or HTML?
Thanks,
Amit
Why are you adding this at the top of the document?
<?xml version="1.0" encoding="iso-8859-1"?>
That will make the browser think the entire document is XML and not XHTML. Removing that line should make it render correctly. I assume Firefox is displaying a page with a bunch of elements which you can expand/collapse to see the content like it normally would for an XML document, even though the HTTP headers might say it's text/html.
So guys,
I have finally solved this problem. The reason was because I wasn't sending MIME header (even though I thought I was) with content type "text/html"
In python HTTPServer, before writing anything to file you always do this:-
self.send_response(301)
self.send_header("Location", self.path + "/")
self.end_headers()
# Once you have called the above methods, you can send the HTML to Client
self.wfile.write('ANY HTML CODE YOU WANT TO WRITE')

Categories