Looking for solution for an automatically scrolling information board - python

Right now, my company has about 40 information boards scattered throughout the buildings with information relevant to each area. Each unit is a small linux based device that is programmed to launch an RDP session, log in with a user name and password, and pull up the appropriate powerpoint and start playing. The boards would go down every 4 hours for about 5 minutes, copy over a new version of a presentation (if applicable) and restart.
We now have "demands" for live data. Unfortunately I believe powerpoint will no longer be an option as we used the Powerpoint viewer, which does not support plugins. I wanted to use Google Slides, but also have the restriction that we cannot have a public facing service like Google Drive, so there goes that idea.
I was thinking of some kind of way to launch a web browser and have it rotate through a list of specified webpages (perhaps stored in a txt or csv file). I found a way to launch Firefox and have it autologin to OBIEE via python:
#source: http://obitool.blogspot.com/2012/12/automatic-login-script-for-obiee-11g_12.html
import unittest
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
# Hardcoding this information is usually not a good Idea!
user = '' # Put your user name here
password = '' # Put your password here
serverName = '' # Put Host name here
class OBIEE11G(unittest.TestCase):
def setUp(self):
# Create a new profile
self.fp = webdriver.FirefoxProfile()
self.fp.set_preference("browser.download.folderList",2)
self.fp.set_preference("browser.download.manager.showWhenStarting",False)
# Associate the profile with the Firefox selenium session
self.driver = webdriver.Firefox(firefox_profile=self.fp)
self.driver.implicitly_wait(2)
# Build the Analytics url and save it for the future
self.base_url = "http://" + serverName + ":9704/analytics"
def login(self):
# Retreive the driver variables created in setup
driver = self.driver
# Goto the loging page
driver.get(self.base_url + "/")
# The 11G login Page has following elements on it
driver.find_element_by_id("sawlogonuser").clear()
driver.find_element_by_id("sawlogonuser").send_keys(user)
driver.find_element_by_id("sawlogonpwd").clear()
driver.find_element_by_id("sawlogonpwd").send_keys(password)
driver.find_element_by_id("idlogon").click()
def test_OBIEE11G(self):
self.login()
#
if __name__ == "__main__":
unittest.main()
If I can use this, I would just need a way to rotate to a new webpage every 30 seconds. Any ideas / recommendations?

You could put a simple javascript snippet on each page that waits a specified time then redirects to the new page. This has the advantage of simple implementation, however it may be annoying to maintain this over many html files.
The other option is to write your copy in a markdown file, then have a single html page that rotates through a list of files and renders and displays the markdown. You would then update the data by rewriting the markdown files. It wouldn't be exactly live, but if 30 second resolution is ok you can get away with it. Something like this for the client code:
HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Message Board</title>
<link rel="stylesheet" type="text/css" href="/css/style.css">
</head>
<body>
<div id="content" class="container"></div>
<!-- jQuery -->
<script src="//code.jquery.com/jquery-2.1.4.min.js" defer></script>
<!-- markdown compiler https://github.com/chjj/marked/ -->
<script src="/js/marked.min.js" defer></script>
<script src="/js/main.js" defer></script>
</body>
</html>
And the javascript
// main.js
// the markdown files
var sites = ["http://mysites.com/site1.md", "http://mysites.com/site2.md", "http://mysites.com/site3.md", "http://mysites.com/site4.md"]
function render(sites) {
window.scrollTo(0,0);
//get first element and rotate list of sites
var file = sites.shift();
sites.push(file);
$.ajax({
url:file,
success: function(data) { $("#content").html(marked(data)); },
cache: false
});
setTimeout(render(sites), 30000);
}
// start the loop
render(sites);
You can then use any method you would like to write out the markdown files.

Related

Python Selenium take screenshots and Save as PDF for windows opened with document.write()

I'm using Selenium with Python (in Jupyter notebook). I have a number of tabs open, say 5 tabs (with all elements already finished loading) and I would like to cycle through them and do 2 things:
Take a screenshot,
(As a bonus) Print each to PDF using Chrome's built-in Save as PDF function using A4 landscape, normal scaling, and a specified default directory, with no user interaction required.
(In the code below I focus on the screenshot requirement, but would also very much like to know how to Save it as a PDF)
This code enables looping through the tabs:
numTabs = len(driver.window_handles)
for x in range(numTabs):
driver.switch_to.window(driver.window_handles[x])
time.sleep(0.5)
However, if I try to add a driver.save_screenshot() call as shown below, the code seems to halt after taking the first screenshot. Specifically, '0.png' is created for the first tab (with index 0) and it switches to the next tab (with index 1) but stops processing further. It doesn't even cycle to the next tab.
numTabs = len(driver.window_handles)
for x in range(numTabs):
driver.switch_to.window(driver.window_handles[x])
driver.save_screenshot(str(x) + '.png') #screenshot call
time.sleep(0.5)
Edit1:
I modified the code as shown below to start taking the screenshots from window_handles[1] instead of [0] as I don't really need the screenshot from [0], but now not even a single screenshot is generated. So it seems that the save_screenshot() call doesn't work after even the initial switch_to.window() call.
tabs = driver.window_handles
for t in range(1, len(tabs)):
print("Processing tab " + tabs[t])
driver.switch_to.window(tabs[t])
driver.save_screenshot(str(t) + '.png') #screenshot call, but the code hangs. No screenshot taking, no further cycling through tabs.
Edit2:
I've found out why my code is "hanging", no matter which method of printing to PDF or taking screenshots I'm using. I mentioned earlier that the new tabs were opened via clicking on buttons from the main page, but upon closer inspection, I see now that the content of the new tabs is generated using document. write(). There's some ajax code that retrieves waybillHTML content that is then written into a new window using document.write(waybillHTML)
For more context, this is an orders system, with the main page with a listing of orders, and a button next to each order that opens up a new tab with a waybill. And the important part is that the waybills are actually generated using document. write() triggered by the button clicks. I notice that the "View page source" option is greyed out when right-clicking in the new tabs. When I use switch_to.window() to switch to one of these tabs, the Page.printToPDF times out after 300 (seconds I suppose).
---------------------------------------------------------------------------
TimeoutException Traceback (most recent call last)
<ipython-input-5-d2f601d387b4> in <module>
14 driver.switch_to.window(handles[x])
15 time.sleep(2)
---> 16 data = driver.execute_cdp_cmd("Page.printToPDF", printParams)
17 with open(str(x) + '.pdf', 'wb') as file:
18 file.write(base64.b64decode(data['data']))
...
TimeoutException: Message: timeout: Timed out receiving a message from renderer: 300.000
(Session info: headless chrome=96.0.4664.110)
So my refined question should be how to use Page.printToPDF to print a page in a new window (that is generated dynamically with document. write()) without timing out?
One approach I tried was to do this:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(options=chrome_options, desired_capabilities=caps)
referring to: this question
but the problem is this is too 'aggressive' and prevents the code from logging into the ordering system and doing the navigating & filtering necessary to get the browser to the point where I get the orders listing page with the waybill generating buttons (i.e. the original setup in this question).
Edit3:
At this point, I've tried something as simple as just getting the page source
try:
pageSrc = driver.find_element(By.XPATH, "//*").get_attribute("outerHTML")
print(pageSrc)
of the dynamically generated tabs (long after they had completed rendering and I can see the content on the screen (not using headless for this stage of the debugging)) and even this itself is throwing a TimeoutException, so I don't think it's an issue with waiting for content to load. Somehow the driver is unable to see the content. It might be something peculiar with how these pages are generated - I don't know. All the methods suggested in the answers for taking screenshots and saving PDFs are good I'm sure for otherwise normal windows. With Chrome the View page source remains greyed out, but I can see regular HTML content using Inspect.
Edit4:
Using Chrome's Inspection function, the page source of the dynamically generated page has this HTML structure:
Curiously, "View page source" remains greyed out even after I'm able to Inspect the contents:
Edit5:
Finally figured it out. When I clicked on the button for generating the new tab, the site would make an ajax call to fetch the HTML content of the new page, and document.write it to a new tab. I suppose this is outside the scope of selenium to handle. Instead, I imported selenium-wire, used driver.wait_for_request to intercept the ajax call, parsed the response containing the HTML code of the new tab, and dumped the cleaned HTML into a new file. From then on generating the PDF can be easily handled using a number of ways as others have already suggested.
I'm a little confused on these part of your questions:
Take a screen shot,
(As a bonus) Print each to PDF using Chrome's built in Save as PDF function using A4 landscape, normal scaling, and a specified default directory, with no user interaction required.
The function save_screenshot saves an image file to your file system. To convert this image file to a PDF you would have to open it and write it out to PDF file.
That task is easy, using various Python PDF modules. I have code for this so let me know if you need it and I will add it the code below.
Concerning printing the webpages in the tabs as a PDFs you would use execute_cdp_cmd with Page.printToPDF. The code below can be modified to support your unknown urls scenario via the button clicks. If you need help with this let me know.
import base64
import traceback
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
# headless mode is required for this method of printing
chrome_options.add_argument("--headless")
# disable the banner "Chrome is being controlled by automated test software"
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])
driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)
# replace this code with your button code
###
driver.get('https://abcnews.go.com')
urls = ['https://www.bbc.com/news', 'https://www.cbsnews.com/', 'https://www.cnn.com', 'https://www.newsweek.com']
for url in urls:
driver.execute_script(f"window.open('{url}','_blank')")
# I'm using a sleep statement, which can be replaced with
# driver.implicitly_wait(x_seconds) or even a
# driver.set_page_load_timeout(x_seconds) statement
sleep(5)
###
# A4 print parameters
params = {'landscape': False,
'paperWidth': 8.27,
'paperHeight': 11.69}
# get the open window handles, which in this care is 5
handles = driver.window_handles
size = len(handles)
# loop through the handles
for x in range(size):
try:
driver.switch_to.window(handles[x])
# adjust the sleep statement as needed
# you can also replace the sleep with
# driver.implicitly_wait(x_seconds)
sleep(2)
data = driver.execute_cdp_cmd("Page.printToPDF", params)
with open(f'file_name_{x}.pdf', 'wb') as file:
file.write(base64.b64decode(data['data']))
# adjust the sleep statement as needed
sleep(3)
except TimeoutException as error:
print('something went wrong')
print(''.join(traceback.format_tb(error.__traceback__)))
driver.close()
driver.quit()
Here is one of my previous answers that might be useful:
Selenium print PDF in A4 format
As your usecase is to Print each to PDF using Chrome's built in Save as PDF function or take a screen shot, instead of opening all the additional links at the same time you may like to open the links in the adjascent tab one by one and take the screenshot using the following Locator Strategies:
Code Block:
num_tabs_to_open = len(elements_href)
windows_before = driver.current_window_handle
# open the links in the adjascent tab one by one to take screenshot
for href in elements_href:
i = 0
driver.execute_script("window.open('" + href +"');")
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0]
driver.switch_to.window(new_window)
driver.save_screenshot(f"image_{str(i)}.png")
driver.close()
driver.switch_to.window(windows_before)
i = i+1
References
You can find a relevant detailed discussion in:
InvalidSessionIdException: Message: invalid session id taking screenshots in a loop using Selenium and Python
Update 2
Based on your recent Edit 3, I now get the source for the new window by retrieving it with an AJAX call. The main page that the driver gets is:
test.html
<!doctype html>
<html>
<head>
<meta name=viewport content="width=device-width,initial-scale=1">
<meta charset="utf-8">
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
<script>
function makeRequest() {
var req = jQuery.ajax({
'method': 'GET',
'url': 'http://localhost/Booboo/test/testa.html'
});
req.done(function(html) {
let w = window.open('', '_blank');
w.document.write(html);
w.document.close();
});
}
</script>
<body>
</body>
<script>
$(function() {
makeRequest();
});
</script>
</html>
And the document it is retrieving, testa.html, as the source for the new window is:
testa.html
<!doctype html>
<html>
<head>
<meta name=viewport content="width=device-width,initial-scale=1">
<meta charset="utf-8">
</head>
<body>
<h1>It works!</h1>
</body>
</html>
And finally the Selenium program gets test.html and enters a loop until it detects that there are now two windows. It then retrieves the source of the second window and takes a snapshot as before using Pillow and image2Pdf.
from selenium import webdriver
import time
def save_snapshot_as_PDF(filepath):
"""
Take a snapshot of the current window and save it as filepath.
"""
from PIL import Image
import image2pdf
from tempfile import mkstemp
import os
if not filepath.lower().endswith('.pdf'):
raise ValueError(f'Invalid or missing filetype for the filepath argument: {filepath}')
# Get a temporary file for the png
(fd, file_name) = mkstemp(suffix='.png')
os.close(fd)
driver.save_screenshot(file_name)
img = Image.open(file_name)
# Remove alpha channel, which image2pdf cannot handle:
background = Image.new('RGB', img.size, (255, 255, 255))
background.paste(img, mask=img.split()[3])
background.save(file_name, img.format)
# Now convert it to a PDF:
with open(filepath, 'wb') as f:
f.write(image2pdf.convert([file_name]))
os.unlink(file_name) # delete temporary file
options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
try:
driver.get('http://localhost/Booboo/test/test.html')
trials = 10
while trials > 10 and len(driver.window_handles) < 2:
time.sleep(.1)
trials -= 1
if len(driver.window_handles) < 2:
raise Exception("Couldn't open new window.")
driver.switch_to.window(driver.window_handles[1])
print(driver.page_source)
save_snapshot_as_PDF('test.pdf')
finally:
driver.quit()
Prints:
<html><head>
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta charset="utf-8">
</head>
<body>
<h1>It works!</h1>
</body></html>

How to identify call stack for dynamic loading of JavaScript files in Selenium?

I am using Selenium with ChromeDriver through Python to load a webpage.
If a page loads a JavaScript file a.js which dynamically loads another JavaScript file b.js to the page, how can I find the "call stack" for each respective script tag?
Example:
I have a page test.html which loads a.js
<html>
<head>
<script src="a.js"></script>
</head>
</html>
a.js in turn loads b.js:
// a.js
s = document.createElement('script');
s.src = 'b.js';
document.head.appendChild(s);
If I load this page through Selenium the scripts will both be loaded and execute as expected. This can be seen by running
with Chrome() as driver:
driver.get("http://localhost/test.html")
script_tags = driver.find_elements_by_tag_name("script")
for s in script_tags:
print(s.src)
This gives the output
http://localhost/a.js
http://localhost/b.js
Is there any way to tell how each respective script was loaded to the page?
In the example above, I would like to know that a.js was loaded directly through the page's HTML source and b.js was loaded by a.js.

Selenium firewall issue "The requested URL was rejected.[...]" [duplicate]

I did several hours of research and asked a bunch of people on fiverr who all couldn't solve a a specific problem I have.
I installed Selenium and tried to access a Website. Unfortunately the site won't allow a specific request and doesn't load the site at all. However, if I try to access the website with my "normal" Chrome Browser, it works fine.
I tried several things such as:
Different IP's
Deleting Cookies
Incognito Mode
Adding different UserAgents
Hiding features which might reveal that a Webdriver is being used
Nothing helped.
Here is a Screenshot of the Error I'm receiving:
And here is the very simple script I'm using:
# coding: utf8
from selenium import webdriver
url = 'https://registrierung.gmx.net/'
# Open ChromeDriver
driver = webdriver.Chrome();
# Open URL
driver.get(url)
If anyone has a solution for that I would highly appreciate it.
I'm also willing to give a big tip if someone could help me out here.
Thanks a lot!
Stay healthy everyone.
I took your code modified with a couple of arguments and executed the test. Here are the observations:
Code Block:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://registrierung.gmx.net/")
print(driver.page_source)
Console Output:
<html style="" class=" adownload applicationcache blobconstructor blob-constructor borderimage borderradius boxshadow boxsizing canvas canvastext checked classlist contenteditable no-contentsecuritypolicy no-contextmenu cors cssanimations csscalc csscolumns cssfilters cssgradients cssmask csspointerevents cssreflections cssremunit cssresize csstransforms3d csstransforms csstransitions cssvhunit cssvmaxunit cssvminunit cssvwunit dataset details deviceorientation displaytable display-table draganddrop fileinput filereader filesystem flexbox fullscreen geolocation getusermedia hashchange history hsla indexeddb inlinesvg json lastchild localstorage no-mathml mediaqueries meter multiplebgs notification objectfit object-fit opacity pagevisibility performance postmessage progressbar no-regions requestanimationframe raf rgba ruby scriptasync scriptdefer sharedworkers siblinggeneral smil no-strictmode no-stylescoped supports svg svgfilters textshadow no-time no-touchevents typedarrays userselect webaudio webgl websockets websqldatabase webworkers datalistelem video svgasimg datauri no-csshyphens"><head>
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<meta http-equiv="CacheControl" content="no-cache">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="shortcut icon" href="data:;base64,iVBORw0KGgo=">
<script type="text/javascript">
(function(){
window["bobcmn"] = "10111111111010200000005200000005200000006200000001249d50ae8200000096200000000200000002300000000300000000300000006/TSPD/300000008TSPD_10130000000cTSPD_101_DID300000005https3000000b0082f871fb6ab200097a0a5b9e04f342a8fdfa6e9e63434256f3f63e9b3885e118fdacf66cc0a382208ea9dc3b70a28002d902f95eb5ac2e5d23ffe409bb24b4c57f9cb8e1a5db4bcad517230d966c75d327f561cc49e16f4300000002TS200000000200000000";
.
.
<script type="text/javascript" src="/TSPD/082f871fb6ab20009afc88ee053e87fea57bf47d9659e73d0ea3c46c77969984660358739f3d19d0?type=11"></script>
<script type="text/javascript">
(function(){
window["blobfp"] = "01010101b00400000100e803000000000d4200623938653464333234383463633839323030356632343563393735363433343663666464633135393536643461353031366131633362353762643466626238663337210068747470733a2f2f72652e73656375726974792e66356161732e636f6d2f72652f0700545350445f3734";window["slobfp"] = "08c3194e510b10009a08af8b7ee6860a22b5726420e697e4";
})();
</script>
<script type="text/javascript" src="/TSPD/082f871fb6ab20009afc88ee053e87fea57bf47d9659e73d0ea3c46c77969984660358739f3d19d0?type=12"></script>
<noscript>Please enable JavaScript to view the page content.<br/>Your support ID is: 11993951574422772310.</noscript>
</head><body>
<style>canvas {display:none;}</style><canvas width="800" height="600"></canvas></body></html>
Browser Snapshot:
Conclusion
From the Page Source it's quite clear that Selenium driven ChromeDriver initiated google-chrome Browsing Context gets detected and the navigation is blocked.
I could have dug deeper and provide some more insights but suprisingly now even manually I am unable to access the webpage. Possibly my IP is black-listed now. Once my IP gets whitelisted I will provide more details.
References
You can find a couple of relevant detailed discussions in:
Can a website detect when you are using selenium with chromedriver?
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Why am I unable to download a zip file with selenium?

I am trying to use selenium in order to try to download a testfile from a html webpage. Here is the complete html page I have been using as test object:
<!DOCTYPE html>
<html>
<head>
<title>Testpage</title>
</head>
<body>
<div id="page">
Example Download link:
Download this testzip
</div>
</body>
</html>
which I put in the current working directory, along with some example zip file renamed to testzip.zip.
My selenium code looks as follows:
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.dir", "/tmp")
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False )
profile.set_preference("pdfjs.disabled", True ) profile.set_preference("browser.helperApps.neverAsk.saveToDisk","application/zip")
profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/zip")
browser = webdriver.Firefox(profile)
browser.get('file:///home/path/to/html/testweb.html')
browser.find_element_by_xpath('//a[contains(text(), "Download this testzip")]').click()
However, if I run the test (with nosetest for example), a browser is being opened, but after that nothing happens. No error message and no download, it just seems to 'hang'.
Any idea on how to fix this?
You are not setting up a real web server. You just have a html page but not a server to serve static files. You need to at least setup a server first.
But if your question is just related to download files, you can just use some international web site to test. It will work.

Can not find out the source of data I need when crawling website

I am writing a web crawler with python. I come across a problem when I am trying to find out the source of the data I need.
The site I am crawling is: https://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League, and the data I want is as below:
I can find these data by browsering the page source after the page has been tatolly loaded by firefox:
DataStore.prime('standings', { stageId:15151, idx:0, field: 'overall'}, [[15151,32,'Manchester United',1,5,4,1,0,16,2,14,13,1,3,3,0,0,10,0,10,9,7,2,1,1,0,6,2,4,4,[[0,1190179,4,0,2,252,'England',2,'Premier League','2017/2018',32,29,'Manchester United','West Ham','Manchester United','West Ham',4,0,'w'] ......
I thought these data should be requested though ajax, but I detected no such request by using the web console.
Then, I simulated the browser behaviour (set header and cookies) requiring the html page:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=2977d8d74f63d7f8fedbea018b7a1d05">
</script>
<script>
(function() {
var z="";var b="7472797B766172207868723B76617220743D6E6577204461746528292E67657454696D6528293B7661722073746174757......";for (var i=0;i<b.length;i+=2){z=z+parseInt(b.substring(i, i+2), 16)+",";}z = z.substring(0,z.length-1); eval(eval('String.fromCharCode('+z+')'));})();
</script></head>
<body>
<iframe style="display:none;visibility:hidden;" src="//content.incapsula.com/jsTest.html" id="gaIframe"></iframe>
</body></html>
I created an .html file with the content above, and open it with firefox, but it seems that the script did not executed. Now, I don`t know how to do, I need some help, thanks!

Categories