Talk to Chrome extensions background conscole via webdriver

Talk to Chrome extensions background conscole via webdriver - python

I am new to web programming.
I am trying to add parameters to one of my Chrome extension.
I know I can enter, for example, "window.localStorage.setItem()" in the extension console. However, I cannot find a way to navigate my webdriver to that extension background page. I have seen, in the past, people would use chrome//extensions:extension_id as url to get to that page, but now this method seems not to work.
Is there any way that I can go to that page directly without telling my webdriver to click programmer mode and then click the extension?
Thanks in advance. This has been bothered me for hours.

Probably one thing to correct here is that chrome://extensions/ opens up the Extensions on Chrome browser. So this shall work fine :
driver.get("chrome://extensions/")
Just a note now the chrome://extensions/extension_id does not take you to the extension page anymore.
In case you are interested in the Details or Options of an extension, you shall try and access it via the links only since the uri is not consistent for different extensions as well. e.g
Extension 1
chrome-extension://gighmmpiobklfepjocnamgkkbiglidom/options/index.html
Extension 2 ::
chrome-extension://fngmhnnpilhplaeedifhccceomclgfbg/options_pages/support.html
The general format has extension_id but not certainly defines the entire uri to get you to the page.
I would suggest if you want to work around with the Extensions, there is this Management API which seems useful to dive in.

Related

Scraping PDF's from a password protected website

I work in tech support and currently have to manually keep our manuals for products updated manually by periodically checking to see if it has an update and if it does replacing the current one saved on our network.
I was wondering if it would be possible to build a small program to quickly download all files on a suppliers website and have them automatically download and be sorted into the given folders for those products, replacing the current PDF's in that file. I must also note that the website is password protected and is sorted into folders.
Is this possible with Python? I figured a small program I could perhaps run once a week or something to automatically update our manuals would be super useful (and a learning experience).
Apologies if I haven't explained the requirement well, any questions let me know.

It's certainly possible. As the other answer suggests you will want to use libaries like Requests (Handle HTTP requests) or Selenium (AUtomated browser activity) to navigate through the login.
You'll need to sort through the links on a given page, could be done with beautifulsoup ideally (An HTML parser) but could be done with selenium (Automated Browser activity).You'll need to check out libraries like requests (To handle HTTP requests) for downloading the pdf's, the OS module for sorting the folders out into specific folders and replacing files.
I strongly urge you to think through the steps, But I hope that gives an idea about the libaries that you'll need to learn abit about. The most challenging thing to learn will be using selenium, so if you can use requests to do the login that is much better.
If you've got a basic grasp of python the requests, OS module and beautifulsoup libraries are not difficult things to pick up.

You can use selenium for browser automation. This could insert the password (although the are you a robot stuff might stop you), and then you can download the pdf's simply by setting a default download location and clicking the download button. This will make the browser download the files to the default download location.

Python screenshot especific tab each time it loads

The problem: I want to write a Python script that takes a screenshot of a website I have opened in a browser each time it loads.
The thing is that I have a website where there are like 300 exam questions which I can get through, try each one of them and I will have the correction when I submit my answer. I will not have access to this questionnaire after a certain date, but I want to keep the questions (which I could write down, but laziness is strong in me, and want to learn Python).
The "attempt": I thought of doing a simple Python script with imgkit to take the screenshots. I'm opened to other suggestions, as imgkit was the first thing I saw while looking for this, and the code looks plain and simple to me:
import imgkit
imgkit.from_url('http://webpage.com', 'out.jpg')
But I have to provide the url for each webpage, and that will be more tedious than taking a screenshot with OS features, thus I want to automatize it.
The questions:
There is a way to make Python monitor a browser tab and take a screenshot each time it reloads (that will be when a new question appears)?
Or maybe get the tab's URL to pass it to imgkit and take the screenshot.
Another thing that I saw is that imgkit can generate a "screenshot" from a HTML file. Can Python download the HTML code from a tab I have open in my browser?

Selenium is your friend here. It is a framework designed for testing but it will make what you want really easy.
Selenium allows you to spin-up a web browser and control it. So you can instruct it to go to the web address you want and then do things. Normally you would instruct it to click here, write in a form, etc.
In your case you only want it to open a certain address, take a screenshot, go the the next address and repeat.
Here you have a tutorial on how to do exactly what you want.
The specific code is:
from selenium import webdriver
#1. Get the driver to manage the web-browser you choose
driver = webdriver.Chrome()
#2. Go the the webadress you want
driver.get('https://python.org')
#3. Take a screenshot
driver.save_screenshot("screenshot.png")
driver.close()
PS: In order for the tutorial to run you will need to have installed the web driver for Selenium to be able to spin-up and run Chrome. Here are the instructions for that.

Refresh a tab on a browser (Python)

My program opens a certain page on using
webbrowser.open(url)
How is it possible to reload the tab containing the url several times?
I could use sleep to set the time limit in which it has to wait before it has to reload.
But how do I refresh the tab after that? (Not open it in a new tab.)

I don't think it would be possible to implement a pure python solution for this which works with different browsers. A solution I would think of is using JavaScript. Vaguely the idea is to create a html file which has an iframe with the url you want and has javascript for reloading the iframe in regular interval. Then use webbrowser module to open that file.
This may sound ugly but this may be the only solution given the security concerns of a browser.
*If you are interested with this idea I can help you writing the code for this.
Hope this helps.

EDIT: below is my OLD answer, I'm not deleting it because it shows the ambiguity in the docs, and could possibly serve as a learning experience to someone.
If you read the docs, they make it sound like its possible. However, it is not possible to do with this module, further more, it seems like no matter what option you give to "new" it always opens in a new tab. Perhaps this behavior is specific to my system, or browser(IE9) but I believe it is more likely a bug in the program.
I investigated further, there is questions about this all over SO. you can't do it with webbrowser or anything built into python.
If you install selenium, you should be able to do what you want.
I am assuming you don't have access to the source code of this webpage, otherwise, you could just use html to do the refresh. If you don't want to install selenium and don't have source access, then you need to make a wrapper for the webpage, and use HTML/JS to refresh the wrapper.
the docs say:
webbrowser.open(url, new=0, autoraise=True)
Display url using the default browser. If new is 0, the url is opened in the same browser window if possible. If new is 1, a new browser window is opened if possible. If new is 2, a new browser page (“tab”) is opened if possible. If autoraise is True, the window is raised if possible (note that under many window managers this will occur regardless of the setting of this variable).
so...
to refresh the page, it would just be:
for i in range(refresh_limit):
time.sleep(wait_time)
webbrowser.open(url)
^^^ this does not actually work^^^

Handling "Download" window by Firefox WebDriver

I'm experimenting with Firefox's WebDriver and I'd like to ask if it is possible to handle "Download" window (to accept or decline incoming download request)?
For example, simple piece of code:
import selenium.firefox.webdriver
dr = selenium.firefox.webdriver.WebDriver()
# Firefox is showed up.
# Let's say I'd want to download python.
dr.get('http://python.org/ftp/python/3.1.3/python-3.1.3.msi')
# Download window is showed up.
# How could I accept the download request?
# As I understand, the method below should return
# two handles but I get only main window's handle.
handles = dr.get_window_handles()
# Seems like WebDriver cannot "see" this popup.
I've experimented with this a little bit but haven't found the solution yet. I'd really appreciate any hint.
Many thanks,
- V

One solution to this is changing WebDriver's Firefox profile to automatically download some MIME types to a given directory.
I'm not sure how (or if) this is exposed in Python, but it's mentioned on the Ruby bindings page on the Selenium wiki (under "Tweaking Firefox preferences").

I don't think that this is the sort of thing that WebDriver was built for, but I'll take a crack at it. There is nothing built into the Firefox WebDriver to handle this specific case, but there are a few approaches you may take.
You can open FF with the profile that your WebDriver script uses and edit the preferences to always save the file instead of asking (Options > Applications > Windows Installer Package - set to "Save File"). Now, however, there's no way to tell that the file is downloading from the browser unless you get redirected to a 404 page. If not, you can check if the file exists in the Downloads directory for the same profile (Options > Main > Donwloads). If it's still in the process of downloading, the filename will be WhateverFileName.ext.part
Your other option is to use the non-visual HTMLUnit driver, navigate to the download link, click it, and the get the page source (will be the contents of the file). This works with textual files, I can't guarantee that it will work similarly for binaries, nor do I know how it will be encoded in such a case.
Best of luck

i came across this when i was trying to download a file using capybara
and got halted by the download prompt
SeleniumHQ : Selenium WebDriver
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.dir'] = "/Downloads"
profile['browser.download.folderList'] = 2
profile['browser.helperApps.neverAsk.saveToDisk'] = "audio/wav"
driver = Selenium::WebDriver.for :firefox, :profile => profile
driver.navigate.to('http://www.address.com/file.wav')
this just downloads the file into the directory specified, no prompt :)
the other option that i came across was
Determining file MIME types to autosave using Firefox & Watir-WebDriver
i have tried watir before and it proved very useful

Selenium with Python, how do I get the page output after running a script?

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks

using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.
for example, using Firefox() driver:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com/')
print(driver.page_source)
driver.quit()

There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it

Ok, so here is how I ended up doing this, for anyone who needs this in the future..
You have to use firefox for this to work.
1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this
2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.
3) then just start the selenium server to use the profile you created (below is an example for linux)
cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/
Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.