Selenium PhantomJS save screenshot not getting the correct page - python

I have the following Python code to take screenshots of webpages. It is working good for most cases, but when i tried to print
http://www.totalwine.com/wine/red-wine/pinot-noir/c/000018
i am getting a different screenshot - different from the actual page (I am getting the correct screenshot sometimes). Can someone help. I've attached the output screenshot that i got as well. Please load the above link on a browser and you will see a different page.
I'm thinking the possible reasons could be
1) Timing of page load
2) Popup
Can someone help
def screenshot_util(url):
browser = webdriver.PhantomJS(service_log_path='ghostdriver.log')
browser.set_window_size(1024, 768)
browser.get(url)
browser.save_screenshot('temp.png')
print(browser.current_url)
browser.quit()
return
url_to_print = 'http://www.totalwine.com/wine/red-wine/pinot-noir/c/000018'
screenshot_util(url_to_print)

Programmatically click the yes button on the pop-up and wait a few seconds like this:
from selenium import webdriver
from time import sleep
def screenshot_util(url):
browser = webdriver.PhantomJS(service_log_path='ghostdriver.log')
browser.set_window_size(1024, 768)
browser.get(url)
browser.find_element_by_id("btnYes").click()
sleep(4)
browser.save_screenshot('temp.png')
print(browser.current_url)
browser.quit()
return
url_to_print = 'http://www.totalwine.com/wine/red-wine/pinot-noir/c/000018'
screenshot_util(url_to_print)
For clarity the lines I added to your code were:
from time import sleep
...
browser.find_element_by_id("btnYes").click()
sleep(4)

Your code was nearly perfect but had a small bug. You are trying to take a screenshot and save it simply as temp.png. Here webdriver gets the name of the screenshot but is not sure about the location where you want to save the screenshot.
browser.save_screenshot('temp.png')
Solution:
As a solution I have used your own code and have provided a logical path to the Screenshots sub-directory (./Screenshots/) already created under my project space. driver seems to be happily saving the screenshot there named as temp.png. Here is the modified code block:
from selenium import webdriver
def screenshot_util(url):
browser = webdriver.PhantomJS(service_log_path='./Logs/logs.log', executable_path="C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe")
browser.set_window_size(1024, 768)
browser.get(url)
browser.save_screenshot('./Screenshots/temp.png')
print(browser.current_url)
browser.quit()
return
url_to_print = 'http://www.totalwine.com/wine/red-wine/pinot-noir/c/000018'
screenshot_util(url_to_print)

Related

Scraping only the portion that loads - Without Scrolling

I have written a simple web scraping code using Selenium but I want to scrape only the portion that is present 'before scroll'
Say, if it is this page I want to scrape - https://en.wikipedia.org/wiki/Pandas_(software) - Selenium reads information till the absolute last element/text which for me is the 'Powered by Media Wiki' button on the far bottom-right of the page.
What I want Selenium to do is stop after DataFrames (see screenshot) and not scroll down to the bottom.
And I also want to know where on the page it stops. I have checked multiple sources and most of them ask for infinite scroll websites. No one asks for just the 'visible' half of a page.
This is my code now:
from selenium import webdriver
EXECUTABLE = r"chromedriver.exe"
# get the URL
url = "https://en.wikipedia.org/wiki/Pandas_(software)"
# open the chromedriver
driver = webdriver.Chrome(executable_path = EXECUTABLE)
# google window is maximized so that all webpages are rendered in the same size
driver.maximize_window()
# make the driver wait for 30 seconds before throwing a time-out exception
driver.implicitly_wait(30)
# get URL
driver.get(url)
for element in driver.find_elements_by_xpath("//*"):
try:
#stuff
except:
continue
driver.close()
Absolutely any direction is appreciated. I have tried to be as clear as possible here but let me know if any more details are required.
I don't think that is possible. Observe the DOM, all the informational elements are under one section I mean one tag div[#id='content'], which is already visible to Selenium. Even if you try with //*, div[#id='content'] is visible.
And trying to check whether the element is visible though not scrolled, will also return True. (If someone knows to do what you are asking for, even I would like to know.)
from selenium import webdriver
from selenium.webdriver.support.expected_conditions import _element_if_visible
driver = webdriver.Chrome(executable_path = 'path to chromedriver.exe')
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://en.wikipedia.org/wiki/Pandas_(software)")
elements = driver.find_elements_by_xpath("//div[#id='content']//*")
for element in elements:
try:
if _element_if_visible(element):
print(element.get_attribute("innerText"))
except:
break
driver.quit()

Still getting download prompt with selenium after setting right MIME-type

Here's my code:
from selenium import webdriver
import selenium
browser = webdriver.Firefox()
profile = webdriver.FirefoxProfile('C:\\path\\to\\the\\profile\\rust_mozprofile9nHEUU')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', '"application/pdf":{"action":0,"extensions":["pdf"]')
browser.get("https://amtsblatt.ag.ch/publikationen/")
elem = browser.find_element_by_id("pdf-select-all")
elem.click()
sleep(5)
elem2 = browser.find_element_by_class_name("pdf-list-head")
elem2.click()
sleep(5)
elem3 = browser.find_element_by_link_text("Exportieren")
elem3.click()
I want to get rid of the save/open-prompt when downloading a pdf-file from a website using selenium and firefox. I specifically looked up the right MIMEtype after accessing the site manually and downloading the pdf and put it as the 2nd argument with profile.set_preference. However, I still do get the save/open-prompt and am completely lost. Where did I go wrong? Any help is greatly appreciated.

Element not Clickable ( Chrome + Selenium + Python)

I am using Chromewebdriver /Selenium in Python
I tried several solutions ( actions, maximize window etc) to get rid of this exception without success.
The error is :
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element ... is not clickable at point (410, 513). Other element would receive the click: ...
The code :
from selenium import webdriver
import time
url = 'https://www.tmdn.org/tmview/welcome#/tmview/detail/EM500000018203824'
driver = webdriver.Chrome(executable_path = "D:\Python\chromedriver.exe")
driver.get(url)
time.sleep(30)
driver.find_element_by_link_text('Show more').click()
I tested this code on my linux pc with latest libraries, python3 and chromedriver. It works perfectly(to my mind). So try to update everything and try again(Try to not to leave chrome). Here is the code:
from selenium import webdriver
import time
url = 'https://www.tmdn.org/tmview/welcome#/tmview/detail/EM500000018203824'
driver = webdriver.Chrome(executable_path = "chromedriver")
driver.get(url)
time.sleep(30)
driver.find_element_by_link_text('Show more').click()
P.S. chromedriver is in the same folder as script.
Thank you for your assistance.
Actually, the issue was the panel on the footer of the web page 'We use cookies....' which is overlapping with the 'Show more' link, when the driver tries to click, the click was intercepted by that panel.
The solution is to close that panel, and the code worked fine.
code is working fine but if you manually click on some other element after page loaded before sleep time is over then you can recreate same error
for example after site is loaded and i clicked on element search for trade mark with similar image then selenium is not able to find search element.so maybe some other part of your code is clicking on other element and loading different url due to which
selenium is generating this error .your code is fine just check for conflict cases.

Taking Web screenshot using imgkit

I was trying to take screenshots using imgkit as follows,
options = {
'width': 1000,
'height': 1000
}
imgkit.from_url('https://www.ozbargain.com.au/', 'out1.jpg', options=options)
What I am getting is
The actual look is a bit different. Possibly this is due to javascript not being executed [its a guess]. Could you please tell me a way how can I do this with imgkit. Any suggested library would be helpful too.
You could use Selenium to control web browser Chrome or Firefox which can run JavaScript and browser has function to take screenshot. But JavaScript may display windows with messages which you may have to close using click() in code - but you would have to find manually (in DevTools in browser) class name, id or other values which helps Selenium to recognize button on page.
from selenium import webdriver
from time import sleep
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get('https://www.ozbargain.com.au/')
driver.set_window_size(1000, 1000)
sleep(2)
# close first message
driver.find_element_by_class_name('qc-cmp-button').click()
sleep(1)
# close second message with details
driver.find_element_by_class_name('qc-cmp-button.qc-cmp-save-and-exit').click()
sleep(1)
driver.get_screenshot_as_file("screenshot.png")
#driver.quit()
Eventually you could use PyAutoGUI or mss to take screenshot of full desktop or some region on desktop.

How to send CTRL-P to a web browser in Python

The aim of this is to open a browser window and save the site as PDF.
I'm writing Python code that:
1) Opens a web page
2) Does a control-p to bring up the print dialog box
NOTE: I will have pre-configured the browser to save as PDF instead of defaulting as printing to a printer)
3) Does "return"
4) Enters the file name
5) Does "return" again
NOTE: In my full code, I'll be doing these steps hundreds of times
I'm having a problem early on with control-p. As a test, I'm able to send dummy text to Google's search, but I can't seem to be able to send a control-p (no error messages). I'm using Google as an easy example, but my final code will use various other sites.
Obviously I'm missing something but just can't figure out what.
I tried an alternate method of using javascript instead of ActionChains:
driver.execute_script('window.print();')
This worked in getting the print dialog but I wasn't able to feed anything else in that dialog box (like , file name and location for the pdf).
I tried PDFkit, to convert the web page into pdf. It worked on some sites, but it crashed often (depending on what the site returned), the page was sometimes poorly formatted and some sites (pinterest) just didn't render at all. For this reason, I changed method and decided to use selenium and Chrome in order for the pdf to render just like it shows in the browser.
I thought about using "element.send_keys("some text")" instead of ActionChains, but since I'm going across multiple different web sites, I don't necessarily know what element to look for.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
URL = "http://www.google.com"
driver.get(URL)
actions = ActionChains(driver)
time.sleep(5) #Give the page time to load
actions.key_down(Keys.CONTROL)
actions.send_keys('p')
actions.key_up(Keys.CONTROL)
actions.perform()
time.sleep(5) #Sleep to see if the print dialog came up
driver.quit()
You can use autoit to achieve your requirement.
First do pip install -U pyautoit
from selenium import webdriver
import autoit
import time
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('http://google.com')
driver.maximize_window()
time.sleep(10)
autoit.send("^p")
time.sleep(10) # Pause to allow you to inspect the browser.
driver.quit()
Please let me know if it's working.
try this:
webdriver.ActionChains(driver).key_down(Keys.CONTROL).send_keys('P').key_up
(Keys.CONTROL).perform()
check this out :
robot.keyPress(KeyEvent.VK_CONTROL)
robot.keyPress(KeyEvent.VK_P)
// CTRL+P is now pressed
robot.keyRelease(KeyEvent.VK_P)
robot.keyRelease(KeyEvent.VK_CONTROL)

Categories