Selenium-Phantomjs download csv - python

I am trying trying to download (save to disk) a CSV file using PhantomJS, from a dialogue box. Using firefox profile, this would be fairly simple by setting the browser profile properties. Any suggestions how could excel file be downloaded in phantomjs?
This is how it would be done using firefox driver:
profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference("browser.download.folderList",2)
profile.set_preference("browser.download.dir",self.opts['output_dir'])
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', ('application/octet-stream,application/msexcel'))
I am using Phantomjs driver:
webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true','--local-storage-path=/tmp'])
and looking for a way to set properties which can override save to disk and set MIME type of the data. Currently without having the properties set, PhantomJS driver, does not download the file.
I have read links about avoiding dialog box etc but in this case, it is needed.

I was recently struggling with a similar issue. However I ended up switching the web driver because it offers the ability to access network traffic relatively easily. This means that if a file is not directly on the page and is rather transferred in you cannot see it in phantom. There are a few people working on work arounds but I found that most of my files were being transferred and thus it was easier for me to gather network traffic with web driver + firebug + net export.
However in phantomjs very hacky way to do this would be something like this:
phantomjs.exe file_to_run.js > my_log.txt
Where you simply save the console contents to file. However you are likely to get errors and other messages in your file. You could clean it since you're only looking for cdv.
From my understanding PhantomJS is limited as the developer has a very specific idea for how it should be. For example they discontinued supporting flash. There is not a easy native way of downloading and saving files like the way you can in firefox. You could launch another web browser and download through that. However I think the easiest way to do this is to use CasperJS which plays nicely with PhantomJS.
A good example of using casperJS to download files can be found here: casperjs download csv file
I believe that the major issue with using casper however is that large files are not well supported. Is there a specific reason you are preferring to use a headless browser?

Related

Can I manipulate an image in the browser with github pages?

Is it possible to upload and manipulate a photo in the browser with GitHub-pages? The photo doesn't need to be stored else than just for that session.
PS. I'm new to this area and I am using python to manipulate the photo.
GitHub pages allows users to create static HTML sites. This means you have no control over the server which hosts the HTML files - it is essentially a file server.
Even if you did have full control over the server (e.g. if you hosted your own website), it would not be possible to allow the client to run Python code in the browser since the browser only interprets JavaScript.
Therefore the most easy solution is to re-write your code in JavaScript.
Failing this, you could offer a download link to your Python script, and have users trust you enough to run it on their computer.

Scraping PDF's from a password protected website

I work in tech support and currently have to manually keep our manuals for products updated manually by periodically checking to see if it has an update and if it does replacing the current one saved on our network.
I was wondering if it would be possible to build a small program to quickly download all files on a suppliers website and have them automatically download and be sorted into the given folders for those products, replacing the current PDF's in that file. I must also note that the website is password protected and is sorted into folders.
Is this possible with Python? I figured a small program I could perhaps run once a week or something to automatically update our manuals would be super useful (and a learning experience).
Apologies if I haven't explained the requirement well, any questions let me know.
It's certainly possible. As the other answer suggests you will want to use libaries like Requests (Handle HTTP requests) or Selenium (AUtomated browser activity) to navigate through the login.
You'll need to sort through the links on a given page, could be done with beautifulsoup ideally (An HTML parser) but could be done with selenium (Automated Browser activity).You'll need to check out libraries like requests (To handle HTTP requests) for downloading the pdf's, the OS module for sorting the folders out into specific folders and replacing files.
I strongly urge you to think through the steps, But I hope that gives an idea about the libaries that you'll need to learn abit about. The most challenging thing to learn will be using selenium, so if you can use requests to do the login that is much better.
If you've got a basic grasp of python the requests, OS module and beautifulsoup libraries are not difficult things to pick up.
You can use selenium for browser automation. This could insert the password (although the are you a robot stuff might stop you), and then you can download the pdf's simply by setting a default download location and clicking the download button. This will make the browser download the files to the default download location.

Talk to Chrome extensions background conscole via webdriver

I am new to web programming.
I am trying to add parameters to one of my Chrome extension.
I know I can enter, for example, "window.localStorage.setItem()" in the extension console. However, I cannot find a way to navigate my webdriver to that extension background page. I have seen, in the past, people would use chrome//extensions:extension_id as url to get to that page, but now this method seems not to work.
Is there any way that I can go to that page directly without telling my webdriver to click programmer mode and then click the extension?
Thanks in advance. This has been bothered me for hours.
Probably one thing to correct here is that chrome://extensions/ opens up the Extensions on Chrome browser. So this shall work fine :
driver.get("chrome://extensions/")
Just a note now the chrome://extensions/extension_id does not take you to the extension page anymore.
In case you are interested in the Details or Options of an extension, you shall try and access it via the links only since the uri is not consistent for different extensions as well. e.g
Extension 1
chrome-extension://gighmmpiobklfepjocnamgkkbiglidom/options/index.html
Extension 2 ::
chrome-extension://fngmhnnpilhplaeedifhccceomclgfbg/options_pages/support.html
The general format has extension_id but not certainly defines the entire uri to get you to the page.
I would suggest if you want to work around with the Extensions, there is this Management API which seems useful to dive in.

Grabbing a .jsp generated PNG in Python

I am trying to grab a PNG image which is being dynamically generated with JSP in a web service.
I have tried visiting the web page it is contained in and grabbing the image src attribute; but the link leads to a .jsp file. Reading the response with urllib2 just shows a lot of gibberish.
I also need to do this while logged into the web service in question, using mechanize. This seems to exclude the option of grabbing a screenshot with webkit2png or similar.
Thanks for any suggestions.
If you use urllib correctly (for example, making sure your User-Agent resembles a browser etc), the "gibberish" you get back is the actual file, so you just need to write it out to disk (open the file with "wb" for writing in binary mode) and re-read it with some image-manipulation library if you need to play with it. Or you can use urlretrieve to save it directly on the filesystem.
If that's a jsp, chances are that it takes parameters, which might be appended by the browser via javascript before the request is done; you should look at the real request your browser makes, before trying to reproduce it. You can do that with the Chrome Developer Tools, Firefox LiveHTTPHeaders, etc etc.
I do hope you're not trying to break a captcha.

Handling "Download" window by Firefox WebDriver

I'm experimenting with Firefox's WebDriver and I'd like to ask if it is possible to handle "Download" window (to accept or decline incoming download request)?
For example, simple piece of code:
import selenium.firefox.webdriver
dr = selenium.firefox.webdriver.WebDriver()
# Firefox is showed up.
# Let's say I'd want to download python.
dr.get('http://python.org/ftp/python/3.1.3/python-3.1.3.msi')
# Download window is showed up.
# How could I accept the download request?
# As I understand, the method below should return
# two handles but I get only main window's handle.
handles = dr.get_window_handles()
# Seems like WebDriver cannot "see" this popup.
I've experimented with this a little bit but haven't found the solution yet. I'd really appreciate any hint.
Many thanks,
- V
One solution to this is changing WebDriver's Firefox profile to automatically download some MIME types to a given directory.
I'm not sure how (or if) this is exposed in Python, but it's mentioned on the Ruby bindings page on the Selenium wiki (under "Tweaking Firefox preferences").
I don't think that this is the sort of thing that WebDriver was built for, but I'll take a crack at it. There is nothing built into the Firefox WebDriver to handle this specific case, but there are a few approaches you may take.
You can open FF with the profile that your WebDriver script uses and edit the preferences to always save the file instead of asking (Options > Applications > Windows Installer Package - set to "Save File"). Now, however, there's no way to tell that the file is downloading from the browser unless you get redirected to a 404 page. If not, you can check if the file exists in the Downloads directory for the same profile (Options > Main > Donwloads). If it's still in the process of downloading, the filename will be WhateverFileName.ext.part
Your other option is to use the non-visual HTMLUnit driver, navigate to the download link, click it, and the get the page source (will be the contents of the file). This works with textual files, I can't guarantee that it will work similarly for binaries, nor do I know how it will be encoded in such a case.
Best of luck
i came across this when i was trying to download a file using capybara
and got halted by the download prompt
SeleniumHQ : Selenium WebDriver
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.dir'] = "/Downloads"
profile['browser.download.folderList'] = 2
profile['browser.helperApps.neverAsk.saveToDisk'] = "audio/wav"
driver = Selenium::WebDriver.for :firefox, :profile => profile
driver.navigate.to('http://www.address.com/file.wav')
this just downloads the file into the directory specified, no prompt :)
the other option that i came across was
Determining file MIME types to autosave using Firefox & Watir-WebDriver
i have tried watir before and it proved very useful

Categories