How do I print a webpage using selenium please.
import time
from selenium import webdriver
# Initialise the webdriver
chromeOps=webdriver.ChromeOptions()
chromeOps._binary_location = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
chromeOps._arguments = ["--enable-internal-flash"]
browser = webdriver.Chrome("C:\\Program Files\\Google\\Chrome\\Application\\chromedriver.exe", port=4445, chrome_options=chromeOps)
time.sleep(3)
# Login to Webpage
browser.get('www.webpage.com')
Note: I am using the, at present, current version of Google Chrome: Version 32.0.1700.107 m
While it's not directly printing the webpage, it is easy to take a screenshot of the entire current page:
browser.save_screenshot("screenshot.png")
Then the image can be printed using any image printing library. I haven't personally used any such library so I can't necessarily vouch for it, but a quick search turned up win32print which looks promising.
The key "trick" is that we can execute JavaScript in the selenium browser window using the "execute_script" method of the selenium webdriver, and if you execute the JavaScript command "window.print();" it will activate the browsers print function.
Now, getting it to work elegantly requires setting a few preferences to print silently, remove print progress reporting, etc. Here is a small but functional example that loads up and prints whatever website you put in the last line (where 'http://www.cnn.com/' is now):
import time
from selenium import webdriver
import os
class printing_browser(object):
def __init__(self):
self.profile = webdriver.FirefoxProfile()
self.profile.set_preference("services.sync.prefs.sync.browser.download.manager.showWhenStarting", False)
self.profile.set_preference("pdfjs.disabled", True)
self.profile.set_preference("print.always_print_silent", True)
self.profile.set_preference("print.show_print_progress", False)
self.profile.set_preference("browser.download.show_plugins_in_list",False)
self.driver = webdriver.Firefox(self.profile)
time.sleep(5)
def get_page_and_print(self, page):
self.driver.get(page)
time.sleep(5)
self.driver.execute_script("window.print();")
if __name__ == "__main__":
browser_that_prints = printing_browser()
browser_that_prints.get_page_and_print('http://www.cnn.com/')
The key command you were probably missing was "self.driver.execute_script("window.print();")" but one needs some of that setup in init to make it run smooth so I thought I'd give a fuller example. I think the trick alone is in a comment above so some credit should go there too.
Related
I've been stuck on a task for a few days. I can't load images automatically on the vinted image browser. I tried running the following code:
from os import listdir
from os.path import isfile, join
from time import sleep
from pyautogui import press, write
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
def get_images(directory_path) -> str:
images: str = ""
f: str
image_name: str
for filename in listdir(directory_path):
f = join(directory_path, filename)
if isfile(f):
image_name = f.replace(f"{directory_path}\\", "")
images += f'\"{image_name}\" '
return directory_path + "\\" + images
option: Options = Options()
option.add_experimental_option("debuggerAddress", "localhost:8989")
driver: Chrome = Chrome(service=Service(ChromeDriverManager().install()),
options=option)
mode: str = By.CSS_SELECTOR
driver.maximize_window()
driver.get("https://www.vinted.it/items/new")
# * images
sleep(1)
driver.find_element(mode, "#photos > div.Cell_cell__3V4ao.Cell_wide__1ukxw > div > div > div > div.media-select__input > div > button").click()
sleep(1)
directory_path: str = r"C:\Users\Memmo\Pictures\Camera Roll"
write(get_images(directory_path))
press('enter')
The problem is that the paths of the recovered images end up on the terminal where the script is run, while they should be set in the upload window. It almost seems that the focus is lost.
I could also set the html of the image on the upload section but it seems a more complicated, expensive and risky way than the one already undertaken.
If someone has already faced "more custom" image browsers compared to the classic ones, I would be curious to know how solved this problem. Thanks in advance.
To upload the file you need to use xpath as "//div[#id='photos']/input"
here is the full code
driver.find_element(By.XPATH, "//div[#id='photos']/input").send_keys("<your image file path>")
Here is the screenshot that shows its working for me
You cannot access the upload window opened by the browser since selenium does not have access outside of the browser page.
You can archive this by installing some other python package that will allow you to have access to the opened window (given you are starting the browser in local machine) or much simpler way would be to get the file input field (in most cases hidden) and assign the image path to it.
Here is a small example: (have not tried it in vinted, I don't have an account there and I'm too lazy to verify my phone number :))
# [...]
# get the file input field
# (on most pages css: input[type="file"] or xpath: //input[#type="file"]
# would be enough, but check if there are more than one file input fields
# in which case you'd have to use index like [0],[1],[n]
# logic on waiting element
# [...]
fileInput = driver.find_element(mode, 'input[type="file"]')
fileInput.send_keys(get_images(directory_path))
# and finally click on the form submit button.
# [...]
If loosing focus on a desktop window is causing problem, than it can be easily fixed as following:
Create a .VBS file with following code at some location:
Set oShell = CreateObject("Wscript.shell")
oShell.AppActivate("<Enter the Window Title Here>")
Before sending any keys simple invoke the .vbs file to set the focus
P.S.
This solution only works on windows
Partial Windows title also work
I was unable to login to the website but I was able to look for some libraries for getting info ("https://github.com/aime-risson/vinted-api-wrapper") and ("https://github.com/hipsuc/Vinted-API/blob/main/VintedApi.py")
I'm trying to use Tor with selenium, which works through the use of tbselenium.
However, when loading an url or clicking a web element, the page immideately closes when finishing the action, instead of remaining open as would be the case when using selenium with chrome.
Any ideas to keep the page open?
import tbselenium.common as cm
from tbselenium.tbdriver import TorBrowserDriver
from tbselenium.utils import launch_tbb_tor_with_stem
tbb_dir = "C:\\pathto\\Tor Browser\\"
tor_process = launch_tbb_tor_with_stem(tbb_path=tbb_dir)
for i in range(1):
with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
driver.load_url("http://hln.be",3,wait_for_page_body=True)
#driver.get('https://google.be')
try:
policypage=driver.find_element_by_xpath("//a[contains(#href,'members/join')]")
policypage.click()
usern=driver.find_element_by_xpath("//input[contains(#id,'user_member_username')]")
usern.send_keys('Tryout')
except:
print('different look')
As Furas said, use the standard driver declaration.
I am taking a trial website case to learn to upload files using Python Selenium where the upload window is not a part of the HTML. The upload window is a system level update. This is already solved using JAVA (stackoverflow link(s) below). If this is not possible via Python then I intent to shift to JAVA for this task.
BUT,
Dear all my fellow Python lovers, why shouldn't it be possible using Python webdriver-Selenium. Hence this quest.
Solved in JAVA for URL: http://www.zamzar.com/
Solution (& JAVA code) in stackoverflow: How to handle windows file upload using Selenium WebDriver?
This is my Python code that should be self explanatory, inclusive of chrome webdriver download links.
Task (uploading file) I am trying in brief:
Website: https://www.wordtopdf.com/
Note_1: I don't need this tool for any work as there are far better packages to do this word to pdf conversion. Instead, this is just for learning & polishing Python Selenium code/application.
Note_2: You will have to painstakingly enter 2 paths into my code below after downloading and unzipping the chrome driver (link below in comments). The 2 paths are: [a] Path of a(/any) word file & [b] path of the unzipped chrome driver.
My Code:
from selenium import webdriver
UNZIPPED_DRIVER_PATH = 'C:/Users/....' # You need to specify this on your computer
driver = webdriver.Chrome(executable_path = UNZIPPED_DRIVER_PATH)
# Driver download links below (check which version of chrome you are using if you don't know it beforehand):
# Chrome Driver 74 Download: https://chromedriver.storage.googleapis.com/index.html?path=74.0.3729.6/
# Chrome Driver 73 Download: https://chromedriver.storage.googleapis.com/index.html?path=73.0.3683.68/
New_Trial_URL = 'https://www.wordtopdf.com/'
driver.get(New_Trial_URL)
time.sleep(np.random.uniform(4.5, 5.5, size = 1)) # Time to load the page in peace
Find_upload = driver.find_element_by_xpath('//*[#id="file-uploader"]')
WORD_FILE_PATH = 'C:/Users/..../some_word_file.docx' # You need to specify this on your computer
Find_upload.send_keys(WORD_FILE_PATH) # Not working, no action happens here
Based on something very similar in JAVA (How to handle windows file upload using Selenium WebDriver?), this should work like a charm. But Voila... total failure and thus chance to learn something new.
I have also tried:
Click_Alert = Find_upload.click()
Click_Alert(driver).send_keys(WORD_FILE_PATH)
Did not work. 'Alert' should be inbuilt function as per these 2 links (https://seleniumhq.github.io/selenium/docs/api/py/webdriver/selenium.webdriver.common.alert.html & Selenium-Python: interact with system modal dialogs).
But the 'Alert' function in the above link doesn't seem to exist in my Python setup even after executing
from selenium import webdriver
#All the readers, hope this doesn't take much of your time and we all get to learn something out of this.
Cheers
You get ('//*[#id="file-uploader"]') which is <a> tag
but there is hidden <input type="file"> (behind <a>) which you have to use
import selenium.webdriver
your_file = "/home/you/file.doc"
your_email = "you#example.com"
url = 'https://www.wordtopdf.com/'
driver = selenium.webdriver.Firefox()
driver.get(url)
file_input = driver.find_element_by_xpath('//input[#type="file"]')
file_input.send_keys(your_file)
email_input = driver.find_element_by_xpath('//input[#name="email"]')
email_input.send_keys(your_email)
driver.find_element_by_id('convert_now').click()
Tested with Firefox 66 / Linux Mint 19.1 / Python 3.7 / Selenium 3.141.0
EDIT: The same method for uploading on zamzar.com
Situation which I saw first time (so it took me longer time to create solution): it has <input type="file"> hidden under button but it doesn't use it to upload file. It create dynamically second <input type="file"> which uses to upload file (or maybe even many files - I didn't test it).
import selenium.webdriver
from selenium.webdriver.support.ui import Select
import time
your_file = "/home/furas/Obrazy/37884728_1975437959135477_1313839270464585728_n.jpg"
#your_file = "/home/you/file.jpg"
output_format = 'png'
url = 'https://www.zamzar.com/'
driver = selenium.webdriver.Firefox()
driver.get(url)
#--- file ---
# it has to wait because paga has to create second `input[#type="file"]`
file_input = driver.find_elements_by_xpath('//input[#type="file"]')
while len(file_input) < 2:
print('len(file_input):', len(file_input))
time.sleep(0.5)
file_input = driver.find_elements_by_xpath('//input[#type="file"]')
file_input[1].send_keys(your_file)
#--- format ---
select_input = driver.find_element_by_id('convert-format')
select = Select(select_input)
select.select_by_visible_text(output_format)
#--- convert ---
driver.find_element_by_id('convert-button').click()
#--- download ---
time.sleep(5)
driver.find_elements_by_xpath('//td[#class="status last"]/a')[0].click()
I want to do web scraping for Bing's search results. Basically, I am using selenium, the idea is to using selenium to click 'Next' automatedly and scrap the URLs of search results of each page. I made it run with chrome browser on my Ubuntu:
from selenium import web driver
import os
class bingURL(object):
def __init__(self):
self.driver=webdriver.Chrome(os.path.expanduser('./chromedriver'))
def get_urls(self,url):
driver=self.driver
driver.get(url)
elems = driver.find_elements_by_xpath("//a[#href]")
href=[]
for elem in elems:
link=elem.get_attribute("href")
try:
if 'bing.com' not in link and 'http' in link and 'microsoft.com' not in link and 'smashboards.com' not in link:
href.append(link)
except:
pass
return list(set(href))
def search_urls(self,keyword,pagenum):
driver=self.driver
searchurl=self.lookup(keyword) ### url of first page of google search
driver.get(searchurl)
results=self.get_urls(searchurl)
for i in range(pagenum):
driver.find_elements_by_class_name("sb_pagN")[0].click() # click 'Next' of bing search result
time.sleep(5) # wait to load page
current_url=driver.current_url
#print(current_url)
#print(self.get_urls(current_url))
results[0:0]=self.get_urls(current_url)
driver.quit()
return results
def lookup(self,query):
return "https://www.bing.com/search?q="+query
if __name__ == "__main__":
g=bingURL()
result=g.search_urls('Stackoverflow is good',10)
it works perfectly, when I run the code, it launches a Chrome browser, and I can saw it go to the next page automatically, and get URLs for 10 pages of searching results.
However, my goal is to run these codes on AWS successfully. The original codes failed with error 'Chrome failed to start'. After google, it seems I need to use a headless browser like PhantomJS on AWS. Thus I installed PhantomJS, and change the def __init__(self): to:
def __init__(self):
self.driver=webdriver.PhantomJS()
However, it cannot click 'next' anymore, and cannot scrap URLs using the old code. The error message is:
File ".../SEARCH_BING_MODULE.py", line 70, in search_urls
driver.find_elements_by_class_name("sb_pagN")[0].click()
IndexError: list index out of range
It looks like change the browser completely change the rules. How should I modify the more original code to make it work again? or how to scrap Bing search results' URLs using selenium+PhantomJS?
Thanks for your help!
Yes, You can perform all operations as per of your all 3 point using headless browser. Don't use HTMLUnit as it have many configuration issue.
PhamtomJS was another approach for headless browser but PhantomJs is having bug these days because of poorly maintenance of it.
You can use chromedriver itself for headless jobs.
You just need to pass one option in chromedriver as below:-
chromeOptions.addArguments("--headless");
Full code will appear like this :-
System.setProperty("webdriver.chrome.driver","D:\\Workspace\\JmeterWebdriverProject\\src\\lib\\chromedriver.exe");
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("--start-maximized");
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://www.google.co.in/");
Hope it will help you :)
I am following this video to get myself familiar with selenium. My code is
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from pyvirtualdisplay import Display
import os
chromedriver = "/usr/bin/chromedriver"
os.environ['webdriver.chrome.driver'] = chromedriver
display = Display(visible=0, size=(800,600))
display.start()
br = webdriver.Chrome(chromedriver)
br.get("http://www.google.com")
Now to print the results
q = br.find_element_by_name('q')
q.send_keys('python')
q.send_keys(Keys.RETURN)
print br.title
results = br.find_elements_by_class_name('g')
print results
for result in results:
print result.text
print "-"*140
The output I am getting is just python and when I try to print results it is [].
When I try the below code in chrome's javascript console it works fine.
res = document.getElementsByClassName('g')[0]
<li class="g">…</li>
res.textContent
" Python Programming Language – Official Websitewww.python.org/Cached - SimilarShareShared on Google+. View the post.You +1'd this publicly. UndoHome page for Python, an interpreted, interactive, object-oriented, extensible programming language. It provides an extraordinary combination of clarity and ...CPython - Documentation - IDEs - GuiProgramming"
So, any idea why am I not getting any results with selenium+python.
Adding time.sleep(3) after q.send_keys(Keys.RETURN) seems to solve the problem. That's because when you press Keys.RETURN, ajax starts working and when you try to collect the result, they aren't on page yet. Selenium, AFAI has no stright way to determine whether the scripts like this have finished execution.
As I think, it would be more reliable to do
br.get("http://www.google.com/search?q=python")
results = br.find_elements_by_class_name('g')