Using Selenium in Python to save a webpage on Firefox

Using Selenium in Python to save a webpage on Firefox - python

I am trying to use Selenium in Python to save webpages on MacOS Firefox.
So far, I have managed to click COMMAND + S to pop up the SAVE AS window. However,
I don't know how to:
change the directory of the file,
change the name of the
file, and
click the SAVE AS button.
Could someone help?
Below is the code I have use to click COMMAND + S:
ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()
Besides, the reason for me to use this method is that I encounter Unicode Encode Error when I :-
write the page_source to a html file and
store scrapped information to a csv file.
Write to a html file:
file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close()
Write to a csv file:
csv_file_write.writerow(to_write)
Error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in
position 1: ordinal not in range(128)

with open('page.html', 'w') as f:
f.write(driver.page_source)

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.
The closes thing you could do is collect the page_source which gives you the entire HTML of a single page and save this to a file.
import codecs
completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)
If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

You cannot interact with system dialogs like save file dialog.
If you want to save the page html you can do something like this:
page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()

This is a complete, working example of the answer RemcoW provided:
You first have to install a webdriver, e.g. pip install selenium chromedriver_installer.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# core modules
import codecs
import os
# 3rd party modules
from selenium import webdriver
def get_browser():
"""Get the browser (a "driver")."""
# find the path with 'which chromedriver'
path_to_chromedriver = ('/usr/local/bin/chromedriver')
browser = webdriver.Chrome(executable_path=path_to_chromedriver)
return browser
save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()
url = "https://martin-thoma.com/"
browser.get(url)
complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()

Related

Save SVG File from wikipedia as SVG in python

Goal:
save SVG from wikipedia
Requirements:
Needs to be automated
Currently I am using selenium to get some other information, and I tried to use a python script like this to extract the svg but the extracted SVG file gives an error when rendering.
Edit:
The same error occurs when using requests, maybe it has something to do with file wikipedia uploaded?
Error code:
Errorcode for svg file
It renders part of the svg later:
Rendered part of SVG
How it should look:
Map;Oslo zoomed out
Wikipedia file
Code imageEctractSingle.py:
from selenium import webdriver
DRIVER_PATH = 'chromedriver.exe'
link = 'https://upload.wikimedia.org/wikipedia/commons/7/75/NO_0301_Oslo.svg'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get(link)
image = driver.page_source
#Creates an SVG File
f = open("kart/oslo.svg", "w")
f.write(image)
f.close()
driver.close()
Original artical that I get the file link from by running through the table in the article
Any ideas on how to exctract this image, I know chrome as a built in function to save as, how can I access that through selenium?
or does there exsist a tool for saving SVG files from selenium?
Thanks in advance for any help :D

Its not selenium but I got it working in requests, you shouldn't need selenium for something this simple unless you are doing more alongside it:
import requests
def write_text(data: str, path: str):
with open(path, 'w') as file:
file.write(data)
url = 'https://upload.wikimedia.org/wikipedia/commons/7/75/NO_0301_Oslo.svg'
svg = requests.get(url).text
write_text(svg, './NO_0301_Oslo.svg')

Looping links in csv file using selenium in python

I am trying to open a .csv file, and open link in .csv file with selenium, and loop through links in .csv file. I am new to Selenium . I can easily do it in beautiful soup.Can you please guide me through right direction.
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import csv
import requests
contents =[]
filename = 'link_business_filter.csv'
def copy_json():
with open('vendors_info_bangkok.json',"a") as wt:
for x in script3:
wt.write(x)
wt.close()
return
with open(filename,'rt') as f:
data = csv.reader(f)
for row in data:
links = row[0]
contents.append(links)
for link in contents:
url_html = requests.get(link)
browser = webdriver.Chrome('chromedriver')
for link_loop in url_html:
open = browser.get(link_loop)
source = browser.page_source
data = bs(source,"html.parser")
body = data.find('body')
script = body
x_path = '//*[#id="react-root"]/section/main/div'
script2 = browser.find_element_by_xpath(x_path)
script3 = script2.text
print(script3)
copy_json()

First install selenium:
pip install selenium
Then according to your os install chromediver then test it by going to folder you have kept the driver and open terminal and type chromedriver, if there's no error then it works.
Then in your code you need to provide executable_path for the chromdriver
In you Code:
....code...
for link in contents:
url_html = requests.get(link)
path to chromdriver = 'C:/Users/chromedriver.exe' #<-- you can keep this file anywhere you wish
browser = webdriver.Chrome(executable_path= 'path_to_chromdriver') #<-- you can also give the path directly here
for link_loop in url_html:
...code...

Getting HTML file from a webpage that is already opened in a browser in python 3

I have been looking on the internet for an answer for this but so far I haven't found quite what I was looking for. So far I'm able to open a webpage via Python webbrowser, but what I want to know is how to download the HTML file from that webpage that Python has asked the browser (firefox in this case) to open. This is because there are certain webpages with sections that I can not fully access without a certain browser extension/addon (MetaMask), as they require to also log in from within that extension, which is done automatically if I open firefox normally or with the webbrowser module. This is why requesting the HTML with an URL directly from Python with code such as this doesn't work:
import requests
url = 'https://www.google.com/'
r = requests.get(url)
r.text
from urllib.request import urlopen
with urlopen(url) as f:
html = f.read()
The only solution I have got so far is to open the webpage with the webbrowser module and then use the pyautogui module, which I can use to make my PC automatically press Ctrl+S (firefox browser hotkeys to save the HTML file from the webpage I'm currently in) and then make it press enter.
import webbrowser
import pyautogui
import time
def get_html():
url='https://example.com/'
webbrowser.open_new(url) #Open webpage in default browser (firefox)
time.sleep(1.2)
pyautogui.hotkey('ctrl', 's')
time.sleep(1)
pyautogui.press('enter')
get_html()
However, I was wondering if there is a more sophisticated and efficient way that doesn't involve simulated key pressing with pyautogui.

Can you try the following:
import requests
url = 'https://www.google.com/'
r = requests.get(url)
with open('page.html', 'w') as outfile:
outfile.write(r.text)
If the above solution doesn't work, you can use selenium library to open a browser:
import time
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
time.sleep(2)
with open('page.html', 'w') as f:
f.write(driver.page_source)

How to download file from a page using python

I am having troubles downloading txt file from this page: https://www.ceps.cz/en/all-data#RegulationEnergy (when you scroll down and see Download: txt, xls and xml).
My goal is to create scraper that will go to the linked page, clicks on the txt link for example and saves a downloaded file.
Main problems that I am not sure how to solve:
The file doesn't have a real link that I can call and download it, but the link is created with JS based on filters and file type.
When I use requests library for python and call the link with all headers it just redirects me to https://www.ceps.cz/en/all-data .
Approaches tried:
Using scraper such as ParseHub to download link didn't work as intended. But this scraper was the closest to what I've wanted to get.
Used requests library to connect to the link using headers that HXR request uses for downloading the file but it just redirects me to https://www.ceps.cz/en/all-data .
If you could propose some solution for this task, thank you in advance. :-)

You can download this data to a directory of your choice with Selenium; you just need to specify the directory to which the data will be saved. In what follows below, I'll save the txt data to my desktop:
from selenium import webdriver
download_dir = '/Users/doug/Desktop/'
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : download_dir}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.ceps.cz/en/all-data')
container = driver.find_element_by_class_name('download-graph-data')
button = container.find_element_by_tag_name('li')
button.click()

You should do like so:
import requests
txt_format = 'txt'
xls_format = 'xls' # open in binary mode
xml_format = 'xlm' # open in binary mode
def download(file_type):
url = f'https://www.ceps.cz/download-data/?format={txt_format}'
response = requests.get(url)
if file_type is txt_format:
with open(f'file.{file_type}', 'w') as file:
file.write(response.text)
else:
with open(f'file.{file_type}', 'wb') as file:
file.write(response.content)
download(txt_format)

How to set the path to a browser executable with python webbrowser

I am trying to build a utility function to output beautiful soup code to a browser I have the following code:
def bs4_to_browser(bs4Tag):
import os
import webbrowser
html= str(bs4Tag)
# html = '<html> ... generated html string ...</html>'
path = os.path.abspath('temp.html')
url = 'file://' + path
with open(path, 'w') as f:
f.write(html)
webbrowser.open(url)
return
This works great and opens up the HTML in the default browser. However I would like to set the path to a portable firefox executable which is at:
F:\FirefoxPortable\firefox.exe
I am using win7. How to I set the path to the portable firefox executable?

You could start your portable Firefox directly with the url as an argument instead.
from subprocess import call
call(["F:\\FirefoxPortable\\firefox.exe", "-new-tab", url])

I know the question is old but here a code working with webbrowser and Python 3.11
myfirefox = webbrowser.Mozilla("F:\\FirefoxPortableESR\\FirefoxPortable.exe")
myfirefox.open(url)
As you will see, it works even if the .exe is not the "real" firefox.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Selenium in Python to save a webpage on Firefox - python

with open('page.html', 'w') as f: f.write(driver.page_source)

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this: page = driver.page_source file_ = open('page.html', 'w') file_.write(page) file_.close()

Related

Save SVG File from wikipedia as SVG in python

Looping links in csv file using selenium in python

Getting HTML file from a webpage that is already opened in a browser in python 3

How to download file from a page using python

How to set the path to a browser executable with python webbrowser

Categories

Resources