Saving pages with Selenium

Saving pages with Selenium - python

I will try again.
The code below I copied from another site and the user say it works (shows a screenshot).Original code
I tested the code: No error, but no file save.
All questions use this answer to save a file: A question!
why the page is not saved or, if it is, where is the file?
Thanks
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe")
driver.get("http://www.example.com")
saveas = ActionChains(driver).key_down(Keys.CONTROL).send_keys('S').key_up(Keys.CONTROL)
saveas.perform()

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe")
driver.get("http://www.example.com")
with open('page.html', 'w+') as f:
f.write(driver.page_source)
Must work

If you do the key combination in the browser, you will see this only brings up the 'save page' dialog box. You need to additionally send ALT+S to save the page, in Windows it will be saved in your Downloads folder by default.
saveas = ActionChains(driver).key_down(Keys.CONTROL).send_keys('S').key_up(Keys.CONTROL).send_keys('MyDocumentName').key_down(Keys.ALT).send_keys('S').key_up(Keys.ALT)
EDIT:
ActionChains are unreliable. It would be easier not to interact with the browser GUI.
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe")
driver.get("http://www.example.com")
with open('page.html', 'w') as f:
f.write(driver.page_source)

Related

Download PDF file from .cfm URL

I'm trying to download a PDF file from this address: https://aisweb.decea.mil.br/inc/notam/gerar-boletim/reports/report-notam.cfm
I wrote some code that first fills out some information in this page (correctly) https://aisweb.decea.mil.br/?i=notam and then clicks a button that opens a new tab to the generated PDF file. The problem is that when it tries to save the PDF file at the end, it downloads directly from the .cfm address, resulting in an empty PDF template (you can see this by clicking the fist link).
How can I download the PDF that is currently being shown to me on the page, instead of accessing the first URL directly?
This is my code
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.common.print_page_options import PrintOptions
from urllib import request
from bs4 import BeautifulSoup
import re
import os
import urllib
import time
import requests
from urllib.parse import urljoin
aerodromos = "SBNT,SBJP,SBFZ,SBRF" #TEST
driver = webdriver.Chrome('C:\Windows\chromedriver.exe')
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=options)
driver.get("https://aisweb.decea.mil.br/?i=notam")
driver.maximize_window()
caixaTexto = driver.find_element(By.XPATH,'//*[#id="icaocode"]')
caixaTexto.send_keys(aerodromos)
botao = driver.find_element(By.XPATH, '//*[#id="a"]/form/div/div[3]/div/input[2]')
botao.click()
botao = driver.find_element(By.XPATH, '//*[#id="select-all"]')
botao.click()
botao = driver.find_element(By.XPATH, '/html/body/div/div/div/div/div[2]/div/div/form/input[3]')
botao.click()
response = urllib.request.urlretrieve('https://aisweb.decea.mil.br/inc/notam/gerar-boletim/reports/report-notam.cfm', filename='relatorio1.pdf')

I did it! When I tried to change the settings in Chrome to download PDFs instead of opening them, it made no difference, but I ended up finding a solution while searching for another way to do it.
Unable to access the modal elements to download pdf with selenium
I changed Chrome experimental options profile in my code and it worked! Now it opens the tab, immediately downloads the file and closes the tab!

Reopen same browser window using selenium python and Firefox

Hey i'm trying to make an automatic program to send Whatsapp messages.
I'm currently using python, Firefox and selenium to achieve that.
The problem is that every time i'm calling driver.get(url) it opens a new instance of the firefox browser, blank with no memories of the last run. It makes me scan the bar code every time I run it.
from selenium import webdriver
from selenium.webdriver.firefox.webdriver import FirefoxProfile
cp_profile = webdriver.FirefoxProfile("/Users/Hodai/AppData/Roaming/Mozilla/Firefox/Profiles/v27qat5d.whatsapp_profile")
driver = webdriver.Firefox(executable_path="/Users/Hodai/Desktop/geckodriver",firefox_profile=cp_profile)
driver.get('http://web.whatsapp.com')
#Scan the code before proceeding further
input('Enter anything after scanning QR code')
I've tried to use profile but it seems like it has no affect.
cp_profile = webdriver.FirefoxProfile("/Users/Hodai/AppData/Roaming/Mozilla/Firefox/Profiles/v27qat5d.whatsapp_profile")
driver = webdriver.Firefox(executable_path="/Users/Hodai/Desktop/geckodriver",firefox_profile=cp_profile)

At the end I used chromedriver to achive my goal.
I tried cookies with pickle but it was a bit tricky because it remembered the cookies just for the same domain.
So I used user data for chrome.
now it works like a charm. thank you all.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("user-data-dir=C:/Users/Designer1/AppData/Local/Google/Chrome/User Data/Profile 1")
driver = webdriver.Chrome(chrome_options=options,executable_path="C:\webdrivers\chromedriver.exe")

The easiest way I think is to save your cookies after scanned the qrcode and push them to Selenium manually.
# Load page to be able to set cookies
driver.get('http://web.whatsapp.com')
# Set saved cookies
cookies = {'name1': 'value1', 'name2', 'value2'}
for name in cookies:
driver.add_cookie({
'name': name,
'value': cookies[name],
})
# Load page using cookies
driver.get('http://web.whatsapp.com')
To get your cookies you can use the console (F12), Network tab, right click on the request, Copy => Copy Request Headers.

It should not be like that. It only opens the new window when initialized with new variable or the program starts again. Here is the code for chrome. It doesn't matter how many times you call driver.get(url) it would open the url in the same browser window
from selenium import webdriver
import selenium.webdriver.support.ui as ui
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome(executable_path=r"C:\new\chromedriver.exe")
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
Let me know if the issue is resolved or you are trying to do something else.

How to download a HTML webpage using Selenium with python?

I want to download a webpage using selenium with python. using the following code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--save-page-as-mhtml')
d = DesiredCapabilities.CHROME
driver = webdriver.Chrome()
driver.get("http://www.yahoo.com")
saveas = ActionChains(driver).key_down(Keys.CONTROL)\
.key_down('s').key_up(Keys.CONTROL).key_up('s')
saveas.perform()
print("done")
However the above code isnt working. I am using windows 7.
Is there any by which i can bring up the 'Save as" Dialog box?
Thanks
Karan

You can use below code to download page HTML:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.yahoo.com")
with open("/path/to/page_source.html", "w", encoding='utf-8') as f:
f.write(driver.page_source)
Just replace "/path/to/page_source.html" with desirable path to file and file name
Update
If you need to get complete page source (including CSS, JS, ...), you can use following solution:
pip install pyahk # from command line
Python code:
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import ahk
firefox = FirefoxBinary("C:\\Program Files (x86)\\Mozilla Firefox\\firefox.exe")
from selenium import webdriver
driver = web.Firefox(firefox_binary=firefox)
driver.get("http://www.yahoo.com")
ahk.start()
ahk.ready()
ahk.execute("Send,^s")
ahk.execute("WinWaitActive, Save As,,2")
ahk.execute("WinActivate, Save As")
ahk.execute("Send, C:\\path\\to\\file.htm")
ahk.execute("Send, {Enter}")

Python, error with web driver (Selenium)

import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get('http://arithmetic.zetamac.com/game?key=a7220a92')
element = driver.find_element_by_link_text('problem')
print(element)
I am getting the error:
FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver'
I am not sure whythis is happening, because I imported selenium already.

Either you provide the ChromeDriver path in webdriver.Chrome or provide the path variable
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
driverLocation = 'D:\Drivers\chromedriver.exe' #if windows
driver = webdriver.Chrome(driverLocation)
driver.get('http://arithmetic.zetamac.com/game?key=a7220a92')
element = driver.find_element_by_link_text('problem')
print(element)

Best way to eliminate this Exception without altering the code eevn single line is to add the chromedriver.exe( or nay other browser driver files) in to Python
site_packages/scripts directory for windows
dist_package/scripts for Linux
Please check this solution, it works.

If you are using a Mac, then don't include '.exe' I put the selenium package directly into my Pycharm project that I called 'SpeechRecognition'. Then in the selenium file, navigate to: /selenium/webdriver/chrome, then copy and paste the 'chromedriver.exe' file you downloaded most likely from [here][1]
Try this script if you are using PyCharm IDE or similar. This should open a new Google window for you.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome('/Users/Name/PycharmProjects/SpeechRecognition/selenium/webdriver/chrome/chromedriver')
browser.get('http://www.google.com')
Then if you want to automatically search an item on Google, add these lines below and run. You should see an automatic google search window opening up. It might disappear quickly but to stop that, you can simply add a while loop if you want or a timer
search = browser.find_element_by_name('q')
search.send_keys('How do I search an item on Google?')
search.send_keys(Keys.RETURN)
[1]: https://sites.google.com/a/chromium.org/chromedriver/home

Selenium-python downloading but file is saved as .part

My script works but it's saving the file as a .part, although checking this against a manually downloaded file its the same size and thankfully complete. I can't understand why it's being saved as a partial file though. Sorta inconvenient for my next idea. Does anybody have an idea of why this might be? Here's my code...which works...
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
import mechanize
import urllib
from urllib import urlretrieve
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",1)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",'Users/matthewyoung/Downloads')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","Plain text")
fp.set_preference("browser.download.manager.scanWhenDone",False)
fp.set_preference("browser.download.manager.showAlertOnComplete",True)
fp.set_preference("browser.download.manager.useWindow",False)
fp.set_preference("browser.helperApps.alwaysAsk.force",False)
browser = webdriver.Firefox(firefox_profile=fp)
#browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://vizier.u-strasbg.fr/vizier/surveys.htx") # Load page
assert "VizieR" in browser.title
#p = raw_input('Star name? ')
elem = browser.find_element_by_name('-c') # Find the query box
elem.send_keys('mwc 560' + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
elem=browser.find_element_by_name('-out.max')
elem.send_keys('unlimited'+Keys.TAB)
elem2=browser.find_element_by_name('-out.form')
time.sleep(0.5)
elem2.send_keys('; -Separated-Values')
time.sleep(0.5)
elem2.send_keys(Keys.TAB)
elem2.send_keys(Keys.TAB)
time.sleep(0.2)
browser.find_element_by_class_name('data').submit()
time.sleep(3.0)
#df=elem2.send_keys(Keys.SPACE)
#print df
browser.close()

It is downloading as .part because that popup save as dialog window appears. Python cannot deal with the popup window. I have found that when you try to set settings for a custom profile in webdriver it doesn't necessarily work (for instance I was able to set a custom profile in selenium to download a csv but not a pdf). However, I was able to solve my pdf problem by creating a custom profile in firefox. I am not very experienced with tsv files so I am not sure what setting that would be. If you can create a new firefox profile (following the instructions here: https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles)
you can try to set that profile to save tsv by default. If you don't know the exact setting to go in and change in "about:config" you can try just click the checkbox on the popup to always save those kinds of files.
From there you set your profile to that custom profile you created like this:
profile = webdriver.firefox.firefox_profile.FirefoxProfile("/Users/matthewyoung/Library/Application Support/Firefox/Profiles/"YOUR PROFILE NAME")
Keep in mind that YOUR PROFILE NAME will have a bunch of random letters first, so follow that path to find the actual profile name.

I think the only thing you are missing from your Firefox profile setting is the following
fp.set_preference("browser.helperApps.neverAsk.openFile",
'Plain Text')
So the entire code should be
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",'Users/matthewyoung/Downloads')
fp.set_preference("browser.helperApps.neverAsk.openFile", 'Plain Text')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","Plain text")
fp.set_preference("browser.download.manager.scanWhenDone",False)
fp.set_preference("browser.download.manager.showAlertOnComplete",True)
fp.set_preference("browser.download.manager.useWindow",False)
fp.set_preference("browser.helperApps.alwaysAsk.force",False)
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://vizier.u-strasbg.fr/vizier/surveys.htx") # Load page
assert "VizieR" in browser.title
elem = browser.find_element_by_name('-c') # Find the query box
elem.send_keys('mwc 560' + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
elem=browser.find_element_by_name('-out.max')
elem.send_keys('unlimited'+Keys.TAB)
elem2=browser.find_element_by_name('-out.form')
time.sleep(0.5)
elem2.send_keys('; -Separated-Values')
time.sleep(0.5)
elem2.send_keys(Keys.TAB)
elem2.send_keys(Keys.TAB)
time.sleep(0.2)
browser.find_element_by_class_name('data').submit()
time.sleep(3.0)
browser.close()

The following value should be used for plain text:
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/plain")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving pages with Selenium - python

from selenium import webdriver driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe") driver.get("http://www.example.com") with open('page.html', 'w+') as f: f.write(driver.page_source) Must work

Related

Download PDF file from .cfm URL

Reopen same browser window using selenium python and Firefox

How to download a HTML webpage using Selenium with python?

Python, error with web driver (Selenium)

Selenium-python downloading but file is saved as .part

Categories

Resources