I want to use PhantomJS in Python. I googled this problem but couldn't find proper solutions.
I find os.popen() may be a good choice. But I couldn't pass some arguments to it.
Using subprocess.Popen() may be a proper solution for now. I want to know whether there's a better solution or not.
Is there a way to use PhantomJS in Python?
The easiest way to use PhantomJS in python is via Selenium. The simplest installation method is
Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs-prebuilt
install selenium (in your virtualenv, if you are using that)
After installation, you may use phantom as simple as:
from selenium import webdriver
driver = webdriver.PhantomJS() # or add to your PATH
driver.set_window_size(1024, 768) # optional
driver.get('https://google.com/')
driver.save_screenshot('screen.png') # save a screenshot to disk
sbtn = driver.find_element_by_css_selector('button.gbqfba')
sbtn.click()
If your system path environment variable isn't set correctly, you'll need to specify the exact path as an argument to webdriver.PhantomJS(). Replace this:
driver = webdriver.PhantomJS() # or add to your PATH
... with the following:
driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs/lib/phantom/bin/phantomjs')
References:
http://selenium-python.readthedocs.io/
How do I set a proxy for phantomjs/ghostdriver in python webdriver?
https://dzone.com/articles/python-testing-phantomjs
PhantomJS recently dropped Python support altogether. However, PhantomJS now embeds Ghost Driver.
A new project has since stepped up to fill the void: ghost.py. You probably want to use that instead:
from ghost import Ghost
ghost = Ghost()
with ghost.start() as session:
page, extra_resources = ghost.open("http://jeanphi.me")
assert page.http_status==200 and 'jeanphix' in ghost.content
Now since the GhostDriver comes bundled with the PhantomJS, it has become even more convenient to use it through Selenium.
I tried the Node installation of PhantomJS, as suggested by Pykler, but in practice I found it to be slower than the standalone installation of PhantomJS. I guess standalone installation didn't provided these features earlier, but as of v1.9, it very much does so.
Install PhantomJS (http://phantomjs.org/download.html) (If you are on Linux, following instructions will help https://stackoverflow.com/a/14267295/382630)
Install Selenium using pip.
Now you can use like this
import selenium.webdriver
driver = selenium.webdriver.PhantomJS()
driver.get('http://google.com')
# do some processing
driver.quit()
Here's how I test javascript using PhantomJS and Django:
mobile/test_no_js_errors.js:
var page = require('webpage').create(),
system = require('system'),
url = system.args[1],
status_code;
page.onError = function (msg, trace) {
console.log(msg);
trace.forEach(function(item) {
console.log(' ', item.file, ':', item.line);
});
};
page.onResourceReceived = function(resource) {
if (resource.url == url) {
status_code = resource.status;
}
};
page.open(url, function (status) {
if (status == "fail" || status_code != 200) {
console.log("Error: " + status_code + " for url: " + url);
phantom.exit(1);
}
phantom.exit(0);
});
mobile/tests.py:
import subprocess
from django.test import LiveServerTestCase
class MobileTest(LiveServerTestCase):
def test_mobile_js(self):
args = ["phantomjs", "mobile/test_no_js_errors.js", self.live_server_url]
result = subprocess.check_output(args)
self.assertEqual(result, "") # No result means no error
Run tests:
manage.py test mobile
The answer by #Pykler is great but the Node requirement is outdated. The comments in that answer suggest the simpler answer, which I've put here to save others time:
Install PhantomJS
As #Vivin-Paliath points out, it's a standalone project, not part of Node.
Mac:
brew install phantomjs
Ubuntu:
sudo apt-get install phantomjs
etc
Set up a virtualenv (if you haven't already):
virtualenv mypy # doesn't have to be "mypy". Can be anything.
. mypy/bin/activate
If your machine has both Python 2 and 3 you may need run virtualenv-3.6 mypy or similar.
Install selenium:
pip install selenium
Try a simple test, like this borrowed from the docs:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
this is what I do, python3.3. I was processing huge lists of sites, so failing on the timeout was vital for the job to run through the entire list.
command = "phantomjs --ignore-ssl-errors=true "+<your js file for phantom>
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# make sure phantomjs has time to download/process the page
# but if we get nothing after 30 sec, just move on
try:
output, errors = process.communicate(timeout=30)
except Exception as e:
print("\t\tException: %s" % e)
process.kill()
# output will be weird, decode to utf-8 to save heartache
phantom_output = ''
for out_line in output.splitlines():
phantom_output += out_line.decode('utf-8')
If using Anaconda, install with:
conda install PhantomJS
in your script:
from selenium import webdriver
driver=webdriver.PhantomJS()
works perfectly.
In case you are using Buildout, you can easily automate the installation processes that Pykler describes using the gp.recipe.node recipe.
[nodejs]
recipe = gp.recipe.node
version = 0.10.32
npms = phantomjs
scripts = phantomjs
That part installs node.js as binary (at least on my system) and then uses npm to install PhantomJS. Finally it creates an entry point bin/phantomjs, which you can call the PhantomJS webdriver with. (To install Selenium, you need to specify it in your egg requirements or in the Buildout configuration.)
driver = webdriver.PhantomJS('bin/phantomjs')
Related
I am taking a trial website case to learn to upload files using Python Selenium where the upload window is not a part of the HTML. The upload window is a system level update. This is already solved using JAVA (stackoverflow link(s) below). If this is not possible via Python then I intent to shift to JAVA for this task.
BUT,
Dear all my fellow Python lovers, why shouldn't it be possible using Python webdriver-Selenium. Hence this quest.
Solved in JAVA for URL: http://www.zamzar.com/
Solution (& JAVA code) in stackoverflow: How to handle windows file upload using Selenium WebDriver?
This is my Python code that should be self explanatory, inclusive of chrome webdriver download links.
Task (uploading file) I am trying in brief:
Website: https://www.wordtopdf.com/
Note_1: I don't need this tool for any work as there are far better packages to do this word to pdf conversion. Instead, this is just for learning & polishing Python Selenium code/application.
Note_2: You will have to painstakingly enter 2 paths into my code below after downloading and unzipping the chrome driver (link below in comments). The 2 paths are: [a] Path of a(/any) word file & [b] path of the unzipped chrome driver.
My Code:
from selenium import webdriver
UNZIPPED_DRIVER_PATH = 'C:/Users/....' # You need to specify this on your computer
driver = webdriver.Chrome(executable_path = UNZIPPED_DRIVER_PATH)
# Driver download links below (check which version of chrome you are using if you don't know it beforehand):
# Chrome Driver 74 Download: https://chromedriver.storage.googleapis.com/index.html?path=74.0.3729.6/
# Chrome Driver 73 Download: https://chromedriver.storage.googleapis.com/index.html?path=73.0.3683.68/
New_Trial_URL = 'https://www.wordtopdf.com/'
driver.get(New_Trial_URL)
time.sleep(np.random.uniform(4.5, 5.5, size = 1)) # Time to load the page in peace
Find_upload = driver.find_element_by_xpath('//*[#id="file-uploader"]')
WORD_FILE_PATH = 'C:/Users/..../some_word_file.docx' # You need to specify this on your computer
Find_upload.send_keys(WORD_FILE_PATH) # Not working, no action happens here
Based on something very similar in JAVA (How to handle windows file upload using Selenium WebDriver?), this should work like a charm. But Voila... total failure and thus chance to learn something new.
I have also tried:
Click_Alert = Find_upload.click()
Click_Alert(driver).send_keys(WORD_FILE_PATH)
Did not work. 'Alert' should be inbuilt function as per these 2 links (https://seleniumhq.github.io/selenium/docs/api/py/webdriver/selenium.webdriver.common.alert.html & Selenium-Python: interact with system modal dialogs).
But the 'Alert' function in the above link doesn't seem to exist in my Python setup even after executing
from selenium import webdriver
#All the readers, hope this doesn't take much of your time and we all get to learn something out of this.
Cheers
You get ('//*[#id="file-uploader"]') which is <a> tag
but there is hidden <input type="file"> (behind <a>) which you have to use
import selenium.webdriver
your_file = "/home/you/file.doc"
your_email = "you#example.com"
url = 'https://www.wordtopdf.com/'
driver = selenium.webdriver.Firefox()
driver.get(url)
file_input = driver.find_element_by_xpath('//input[#type="file"]')
file_input.send_keys(your_file)
email_input = driver.find_element_by_xpath('//input[#name="email"]')
email_input.send_keys(your_email)
driver.find_element_by_id('convert_now').click()
Tested with Firefox 66 / Linux Mint 19.1 / Python 3.7 / Selenium 3.141.0
EDIT: The same method for uploading on zamzar.com
Situation which I saw first time (so it took me longer time to create solution): it has <input type="file"> hidden under button but it doesn't use it to upload file. It create dynamically second <input type="file"> which uses to upload file (or maybe even many files - I didn't test it).
import selenium.webdriver
from selenium.webdriver.support.ui import Select
import time
your_file = "/home/furas/Obrazy/37884728_1975437959135477_1313839270464585728_n.jpg"
#your_file = "/home/you/file.jpg"
output_format = 'png'
url = 'https://www.zamzar.com/'
driver = selenium.webdriver.Firefox()
driver.get(url)
#--- file ---
# it has to wait because paga has to create second `input[#type="file"]`
file_input = driver.find_elements_by_xpath('//input[#type="file"]')
while len(file_input) < 2:
print('len(file_input):', len(file_input))
time.sleep(0.5)
file_input = driver.find_elements_by_xpath('//input[#type="file"]')
file_input[1].send_keys(your_file)
#--- format ---
select_input = driver.find_element_by_id('convert-format')
select = Select(select_input)
select.select_by_visible_text(output_format)
#--- convert ---
driver.find_element_by_id('convert-button').click()
#--- download ---
time.sleep(5)
driver.find_elements_by_xpath('//td[#class="status last"]/a')[0].click()
I am trying to download a file from a link on webpage. However I get annoying warning "This type of file can harm...anyway? Keep, Discard". I tried several options to avoid the warning but still getting it. I am using robot framework however I am using python to create new keyword for me.
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
self.open_browser(url, 'chrome',alias=None, remote_url=False, desired_capabilities=options.to_capabilities(), ff_profile_dir=None)
Can someone please suggest a way to disable the download warning?
I found an answer with some research. For some reason (may be a bug) open_browser does not set capabilities for chrome.
So, the alternative is to use 'create_webdriver'. Worked with following code:
#keyword('open "${url}" in chrome browser')
def open_chrome_browser(self, url):
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
options.add_argument("--disable-web-security")
options.add_argument("--allow-running-insecure-content")
options.add_argument("--safebrowsing-disable-extension-blacklist")
options.add_argument("--safebrowsing-disable-download-protection")
prefs = {'safebrowsing.enabled': 'true'}
options.add_experimental_option("prefs", prefs)
instance = self.create_webdriver('Chrome', desired_capabilities=options.to_capabilities())
self.go_to(url)
You need to add all the params in list. Then pass this list to Dictionary object and pass it to open browser.
Ex.
${list} = Create List --start-maximized --disable-web-security
${args} = Create Dictionary args=${list}
${desired caps} = Create Dictionary platform=${OS} chromeOptions=${args}
Open Browser https://www.google.com remote_url=${grid_url} browser=${BROWSER} desired_capabilities=${desired caps}
The below would be a simpler solution:
Open Browser ${URL} ${BROWSER} options=add_argument("--disable-notifications")
for multiple option, you can use with ; seperated.
options=add_argument("--disable-popup-blocking"); add_argument("--ignore-certificate-errors")
It is better not to disable any security features or any other defaults(unless there is ample justification) which comes with the browser "just to solve one problem", it would be better to find the solution by not touching it at all, and
just make use of requests modules in python and use the same as keyword, wherever you would want later on in all of your codebases. The reason for this approach is that it is better to get the job done making use of ubiquitous modules rather than spending time on one module for extensive amounts of time, I use to do that before, better install requests + robotframework-requests library and others to just get the job completed.
Just use the below code to create a keyword out of it and call it wherever you want, instead of going through the hassle of fixing browser behavior.
import requests
file_url = "http://www.africau.edu/images/default/sample.pdf"
r = requests.get(file_url, stream=True)
with open("sample.pdf", "wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
# writing one chunk at a time to pdf file
if chunk:
pdf.write(chunk)
This worked for me (must use SeleniumLibrary 4). Modify Chrome so it downloads PDFs instead of viewing them:
${chrome_options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
${disabled} Create List Chrome PDF Viewer PrintFileServer
${prefs} Create Dictionary download.prompt_for_download=${FALSE} plugins.always_open_pdf_externally=${TRUE} plugins.plugins_disabled=${disabled}
Call Method ${chrome_options} add_experimental_option prefs ${prefs}
${desired_caps}= Create Dictionary browserName=${browserName} version=${version} platform=${platform} screenResolution=${screenResolution} record_video=${record_video} record_network=${record_network} build=${buildNum} name=${globalTestName}
Open Browser url=${LOGINURL} remote_url=${remote_url} options=${chrome_options} desired_capabilities=${desired_caps}
I am trying to get video url from links on this page. Video link could be seen on https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html . (Open in Chrome)
For that I wrote chrome web driver related code as below :
from bs4 import BeautifulSoup
from selenium import webdriver
from pyvirtualdisplay import Display
chromedriver = '/usr/local/bin/chromedriver'
os.environ['webdriver.chrome.driver'] = chromedriver
display = Display(visible=0, size=(800,600))
display.start()
driver = webdriver.Chrome(chromedriver)
driver.get('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')
try:
element = WebDriverWait(driver, 20).until(lambda driver: driver.find_elements_by_class_name('yvp-main'))
self.yahoo_video_trend = []
for s in driver.find_elements_by_class_name('yvp-main'):
print "Processing link - ", item['link']
trend = item
print item['description']
trend['video_link'] = s.find_element_by_tag_name('video').get_attribute('src')
print
print s.find_element_by_tag_name('video').get_attribute('src')
self.yahoo_video_trend.append(trend)
except:
return
This works fine on my local system but when I run on my azure server it does not give any result at s.find_element_by_tag_name('video').get_attribute('src')
I have installed chrome on my azureserver.
Update :
Please see, requests and Beautifulsoup I already tried, but as yahoo loads html content dynamically from json, I could not get it using them.
And yeah azure server is simple linux system with command line access. Not any application.
I tried to reproduce your issue using you code. However, I found there was no tag named video in that page('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')(using IE and Chrome to test).
I used the developer Tool to check the HTML code, like this picture:
It seems that this page used the flash player to play video,not HTML5 video control.
For this reason, I suggest that you can check your code whether used the rightly tag name.
Any concerns, please feel free to let me know.
We tried to reproduce the error on our side. I was not able to get chrome driver to work, but I did try the firefox driver and it worked fine. It was able to load the page and get the link via the URL.
Can you change your code to print the exception and send it to us, to see where the script is failing?
Change your code:
except:
return
try
do
except Exception,e: print str(e)
Send us the exception, so we can take a look.
I want to extract some data from Amazon(link in the following code)
Here is my code:
import urllib2
url="http://www.amazon.com/s/ref=sr_nr_n_11?rh=n%3A283155%2Cn%3A%2144258011%2Cn%3A2205237011%2Cp_n_feature_browse-bin%3A2656020011%2Cn%3A173507&bbn=2205237011&sort=titlerank&ie=UTF8&qid=1393984161&rnid=1000"
webpage=urllib2.urlopen(url).read()
doc=open("test.html","w")
doc.write(webpage)
doc.close()
When I open the test.html, the content of my page is different from the website in the Internet.
The page involves javascript execution.
urllib2.urlopen(..).read() simply read the url content. So they are different.
To get same content, you need to use library that can handle javascript.
For example, following code uses selenium:
from selenium import webdriver
url = 'http://www.amazon.com/s/ref=sr_nr_n_11?...161&rnid=1000'
driver = webdriver.Firefox()
driver.get(url)
with open('test.html', 'w') as f:
f.write(driver.page_source.encode('utf-8'))
driver.quit()
To complete falsetru's answer:
another solution is to use python-ghost. It is based on Qt. It's much heavier to install, so I advice Selenium too.
Using Firefox will open it up on script execution. To not have it on your way, use PhantomJS:
apt-get install nodejs # you get npm, the Node Package Manager
npm install -g phantomjs # install globally
[…]
driver = webdriver.PhantomJS()
How do I print a webpage using selenium please.
import time
from selenium import webdriver
# Initialise the webdriver
chromeOps=webdriver.ChromeOptions()
chromeOps._binary_location = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
chromeOps._arguments = ["--enable-internal-flash"]
browser = webdriver.Chrome("C:\\Program Files\\Google\\Chrome\\Application\\chromedriver.exe", port=4445, chrome_options=chromeOps)
time.sleep(3)
# Login to Webpage
browser.get('www.webpage.com')
Note: I am using the, at present, current version of Google Chrome: Version 32.0.1700.107 m
While it's not directly printing the webpage, it is easy to take a screenshot of the entire current page:
browser.save_screenshot("screenshot.png")
Then the image can be printed using any image printing library. I haven't personally used any such library so I can't necessarily vouch for it, but a quick search turned up win32print which looks promising.
The key "trick" is that we can execute JavaScript in the selenium browser window using the "execute_script" method of the selenium webdriver, and if you execute the JavaScript command "window.print();" it will activate the browsers print function.
Now, getting it to work elegantly requires setting a few preferences to print silently, remove print progress reporting, etc. Here is a small but functional example that loads up and prints whatever website you put in the last line (where 'http://www.cnn.com/' is now):
import time
from selenium import webdriver
import os
class printing_browser(object):
def __init__(self):
self.profile = webdriver.FirefoxProfile()
self.profile.set_preference("services.sync.prefs.sync.browser.download.manager.showWhenStarting", False)
self.profile.set_preference("pdfjs.disabled", True)
self.profile.set_preference("print.always_print_silent", True)
self.profile.set_preference("print.show_print_progress", False)
self.profile.set_preference("browser.download.show_plugins_in_list",False)
self.driver = webdriver.Firefox(self.profile)
time.sleep(5)
def get_page_and_print(self, page):
self.driver.get(page)
time.sleep(5)
self.driver.execute_script("window.print();")
if __name__ == "__main__":
browser_that_prints = printing_browser()
browser_that_prints.get_page_and_print('http://www.cnn.com/')
The key command you were probably missing was "self.driver.execute_script("window.print();")" but one needs some of that setup in init to make it run smooth so I thought I'd give a fuller example. I think the trick alone is in a comment above so some credit should go there too.