I am trying to open an HTML file from Python but my script just displays the contents of the HTML file in Python instead of opening it in the browser. How can I fix this problem? How can I open the HTML file in my Chrome browser?
testdata.html
<div>
<img src="https://plot.ly/~user001/2.png" alt="Success vs Failure" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" />
<script data-plotly="user001:2" src="https://plot.ly/embed.js" async></script>
</div>
Python 2.7 script:
import urllib
page = urllib.urlopen('testdata.html').read()
print page
Try specifying the "file://" at the start of the URL.
// Also, use the absolute path of the file:
webbrowser.open('file://' + os.path.realpath(filename))
Or
import webbrowser
new = 2 # open in a new tab, if possible
// open a public URL, in this case, the webbrowser docs
url = "http://docs.python.org/library/webbrowser.html"
webbrowser.open(url,new=new)
// open an HTML file on my own (Windows) computer
url = "file://d/testdata.html"
webbrowser.open(url,new=new)
import os
os.system("start [your's_url]")
Enjoy!
You can use webbrowser library:
import webbrowser
url = 'file:///path/to/your/file/testdata.html'
webbrowser.open(url, new=2) # open in new tab
Here's a way that doesn't require external libraries and that can work of local files as well.
import subprocess
import os
url = "https://stackoverflow.com"
# or a file on your computer
# url = "/Users/yourusername/Desktop/index.html
try: # should work on Windows
os.startfile(url)
except AttributeError:
try: # should work on MacOS and most linux versions
subprocess.call(['open', url])
except:
print('Could not open URL')
You can use Selenium.
download the latest chromedriver, paste the chromedriver.exe in "C:\Python27\Scripts".
then
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("your page path")
print driver.page_source.encode('utf-8')
driver.quit()
display.stop()
I feel this is the easiest solution:
import os
os.getcwd() #To check the current working directory or path
os.chdir("D:\\Folder Name\\") # D:\Folder Name\ is the new path where you want to save the converted dataframe(df) to .html file
import webbrowser
df.to_html("filename.html") #Converting dataframe df to html and saving with a name 'filename' and
webbrowser.get("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s").open("file://" + os.path.realpath("filename.html"))
you can download latest version of "gecodriver" from here.then add gecodriver executable file to your project.then pip install selenium and below the code for windows:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import os
#optional
options = Options()
options.set_preference('permissions.default.image', 2)
options.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', False)
#for windows
Driver = webdriver.Firefox(options=options, executable_path='geckodriver.exe')
Driver.implicitly_wait(15)
#path of your project -> reference : "https://stackoverflow.com/questions/25389095/python-get-path-of-root-project-structure/40227116"
Root = os.path.dirname(os.path.abspath(__file__))
driver.get('file://' + Root + 'path/to/htmlfile')
Hope I Helped You:)
import os
os.system('open "/Applications/Safari.app" '+ '"' + os.path.realpath(fname)+ '"')
Related
website:
https://www.ting22.com/ting/659-2.html
I'd like to get some audiobooks from the website above. In other words, I want to download the MP3 files of the audiobook from 659-2.html to 659-1724.html.
By using F12 tools, In [Network]->[Media], I can see the Request URL of MP3 file, But I don't know how to get the URL using a script.
Here are some specs of what I'm using:
System: Windows 7 x64
Python: 3.7.0
Update:
For example, by using F12 tool, I can see the file's url is "http://audio.xmcdn.com/group58/M03/8D/07/wKgLc1zNaabhA__WAEJyyPUT5k4509.mp3"
But I don't know how to get the URL of MP3 file in code ? Rather than how to download the file.
which library should I use?
Thank you.
UPDATE
Well that would be a bit more complicated because requests packages won't return the .mp3 source, so you need to use Selenium. Here is a tested solution:
from selenium import webdriver # pip install selenium
import urllib3
import shutil
import os
if not os.path.exists(os.getcwd()+'/mp3_folder'):
os.mkdir(os.getcwd()+'/mp3_folder')
def downloadFile(url=None):
filename = url.split('/')[-1]
c = urllib3.PoolManager()
with c.request('GET', url, preload_content=False) as resp, open('mp3_folder/'+filename, 'wb') as out_file:
shutil.copyfileobj(resp, out_file)
resp.release_conn()
driver = webdriver.Chrome('chromedriver.exe') # download chromedriver from here and place it near the script: https://chromedriver.storage.googleapis.com/72.0.3626.7/chromedriver_win32.zip
for i in range(2, 1725):
try:
driver.get('https://www.ting22.com/ting/659-%s.html' % i)
src = driver.find_element_by_id('mySource').get_attribute('src')
downloadFile(src)
print(src)
except Exception as exc:
print(exc)
I am trying to open a .csv file, and open link in .csv file with selenium, and loop through links in .csv file. I am new to Selenium . I can easily do it in beautiful soup.Can you please guide me through right direction.
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import csv
import requests
contents =[]
filename = 'link_business_filter.csv'
def copy_json():
with open('vendors_info_bangkok.json',"a") as wt:
for x in script3:
wt.write(x)
wt.close()
return
with open(filename,'rt') as f:
data = csv.reader(f)
for row in data:
links = row[0]
contents.append(links)
for link in contents:
url_html = requests.get(link)
browser = webdriver.Chrome('chromedriver')
for link_loop in url_html:
open = browser.get(link_loop)
source = browser.page_source
data = bs(source,"html.parser")
body = data.find('body')
script = body
x_path = '//*[#id="react-root"]/section/main/div'
script2 = browser.find_element_by_xpath(x_path)
script3 = script2.text
print(script3)
copy_json()
First install selenium:
pip install selenium
Then according to your os install chromediver then test it by going to folder you have kept the driver and open terminal and type chromedriver, if there's no error then it works.
Then in your code you need to provide executable_path for the chromdriver
In you Code:
....code...
for link in contents:
url_html = requests.get(link)
path to chromdriver = 'C:/Users/chromedriver.exe' #<-- you can keep this file anywhere you wish
browser = webdriver.Chrome(executable_path= 'path_to_chromdriver') #<-- you can also give the path directly here
for link_loop in url_html:
...code...
I have a Selenium script in Python (using ChromeDriver on Windows) that fetches the download links of various attachments(of different file types) from a page and then opens these links to download the attachments. This works fine for the file types which ChromeDriver can't preview as they get downloaded by default. But images(JPEG, PNG) and PDFs are previewed by default and hence aren't automatically downloaded.
The ChromeDriver options I am currently using (work for non preview-able files) :
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : 'custom_download_dir'}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome("./chromedriver.exe", chrome_options=chrome_options)
This downloads the files to 'custom_download_dir', no issues. But the preview-able files are just previewed in the ChromeDriver instance and not downloaded.
Are there any ChromeDriver Settings that can disable this preview behavior and directly download all files irrespective of the extensions?
If not, can this be done using Firefox for instance?
Instead of relying in specific browser / driver options I would implement a more generic solution using the image url to perform the download.
You can get the image URL using similar code:
driver.find_element_by_id("your-image-id").get_attribute("src")
And then I would download the image using, for example, urllib.
Here's some pseudo-code for Python2:
import urllib
url = driver.find_element_by_id("your-image-id").get_attribute("src")
urllib.urlretrieve(url, "local-filename.jpg")
Here's the same for Python3:
import urllib.request
url = driver.find_element_by_id("your-image-id").get_attribute("src")
urllib.request.urlretrieve(url, "local-filename.jpg")
Edit after the comment, just another example about how to download a file once you know its URL:
import requests
from PIL import Image
from io import StringIO
image_name = 'image.jpg'
url = 'http://example.com/image.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
With selenium-wire library, it is possible to download images via ChromeDriver.
I have defined the following function to parse each request and save the request body to a file when necessary.
import os
from mimetypes import guess_extension
from seleniumwire import webdriver
def download_assets(requests, asset_dir="temp", default_fname="untitled", exts=[".png", ".jpeg", ".jpg", ".svg", ".gif", ".pdf", ".ico"]):
asset_list = {}
for req_idx, request in enumerate(requests):
# request.headers
# request.response.body is the raw response body in bytes
ext = guess_extension(request.response.headers['Content-Type'].split(';')[0].strip())
if ext is None or ext not in exts:
#Don't know the file extention, or not in the whitelist
continue
# Construct a filename
fname = os.path.basename(request.url.split('?')[0])
fname = "".join(x for x in fname if (x.isalnum() or x in "._- "))
if fname == "":
fname = f"{default_fname}_{req_idx}"
if not fname.endswith(ext):
fname = f"{fname}{ext}"
fpath = os.path.join(asset_dir, fname)
# Save the file
print(f"{request.url} -> {fpath}")
asset_list[fpath] = request.url
with open(fpath, "wb") as file:
file.write(request.response.body)
return asset_list
Let's download some images from Google homepage to temp folder.
# Create a new instance of the Chrome/Firefox driver
driver = webdriver.Chrome()
# Go to the Google home page
driver.get('https://www.google.com')
# Download content to temp folder
asset_dir = "temp"
os.makedirs(asset_dir, exist_ok=True)
download_assets(driver.requests, asset_dir=asset_dir)
driver.close()
Note that the function can be improved such that the directory structure can be kept as well.
Here is another simple way, but #Pitto's answer above is slightly more succinct.
import requests
webelement_img = ff.find_element(By.XPATH, '//img')
url = webelement_img.get_attribute('src') or 'https://someimages.com/path-to-image.jpg'
data = requests.get(url).content
local_filename = 'filename_on_your_computer.jpg'
with open (local_filename, 'wb') as f:
f.write(data)
I am trying to build a utility function to output beautiful soup code to a browser I have the following code:
def bs4_to_browser(bs4Tag):
import os
import webbrowser
html= str(bs4Tag)
# html = '<html> ... generated html string ...</html>'
path = os.path.abspath('temp.html')
url = 'file://' + path
with open(path, 'w') as f:
f.write(html)
webbrowser.open(url)
return
This works great and opens up the HTML in the default browser. However I would like to set the path to a portable firefox executable which is at:
F:\FirefoxPortable\firefox.exe
I am using win7. How to I set the path to the portable firefox executable?
You could start your portable Firefox directly with the url as an argument instead.
from subprocess import call
call(["F:\\FirefoxPortable\\firefox.exe", "-new-tab", url])
I know the question is old but here a code working with webbrowser and Python 3.11
myfirefox = webbrowser.Mozilla("F:\\FirefoxPortableESR\\FirefoxPortable.exe")
myfirefox.open(url)
As you will see, it works even if the .exe is not the "real" firefox.
import urllib
fun open():
return urllib.urlopen('http://example.com')
But when example.com opens it does not render CSS or JavaScript. How can I open the webpage in a web browser?
#error(404)
def error404(error):
return webbrowser.open('http://example.com')
I am using bottle. Giving me the error:
TypeError("'bool' object is not iterable",)
with the webbrowser module
import webbrowser
webbrowser.open('http://example.com') # Go to example.com
import webbrowser
webbrowser.open(url, new=0, autoraise=True)
Display url using the default browser. If new is 0, the url is opened in the same browser window if possible. If new is 1, a new browser window is opened if possible. If new is 2, a new browser page (“tab”) is opened if possible. If autoraise is True, the window is raised
webbrowser.open_new(url)
Open url in a new window of the default browser
webbrowser.open_new_tab(url)
Open url in a new page (“tab”) of the default browser
On Windows
import os
os.system("start \"\" https://example.com")
On macOS
import os
os.system("open \"\" https://example.com")
On Linux
import os
os.system("xdg-open \"\" https://example.com")
Cross-Platform
import webbrowser
webbrowser.open('https://example.com')
You have to read the data too.
Check out : http://www.doughellmann.com/PyMOTW/urllib2/ to understand it.
response = urllib2.urlopen(..)
headers = response.info()
data = response.read()
Of course, what you want is to render it in browser and aaronasterling's answer is what you want.
You could also try:
import os
os.system("start \"\" http://example.com")
This, other than #aaronasterling ´s answer has the advantage that it opens the default web browser.
Be sure not to forget the "http://".
Here is another way to do it.
import webbrowser
webbrowser.open("foobar.com")
I think this is the easy way to open a URL using this function
webbrowser.open_new_tab(url)