Inserting a html file into python file - python

I'm using Pycharm on Windows 10 and I'd like to use a html file inside a python file, so what should I do? I have my code already written, but the webpage seems not to run this html file.
To visualize this, I share my code:
from flask import Flask, render_template
app=Flask(__name__)
#app.route('/')
def home():
return render_template("home.html")
#app.route('/about/')
def about():
return render_template("about.html")
if __name__=="__main__":
app.run(debug=True)
And after deploying this python file locally, I'd like these htmls to work, but the program doesn't seem to see them. Where should I put these html files or what should I do with them? I have them all in a one folder on my PC.

Use BeautifulSoup. Here's an example there a meta tag is inserted right after the title tag using insert_after():
from bs4 import BeautifulSoup as Soup
html = """
<html>
<head>
<title>Test Page</title>
</head>
<body>
<div>test</div>
</html>
"""
soup = Soup(html)
title = soup.find('title')
meta = soup.new_tag('meta')
meta['content'] = "text/html; charset=UTF-8"
meta['http-equiv'] = "Content-Type"
title.insert_after(meta)
print soup
prints:
<html>
<head>
<title>Test Page</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
</head>
<body>
<div>test</div>
</body>
</html>
You can also find head tag and use insert() with a specified position:
head = soup.find('head')
head.insert(1, meta)
Also see:
Add parent tags with beautiful soup
How to append a tag after a link with BeautifulSoup

Related

Why doesn't this html template render the scraped website data in the web browser(Python, Flask, HTML, Web-Scraping)?

Whenever I print out the scraped data in the terminal it shows the scraped data fine, but whenever I try serve it using Python Flask, the HTML template that I'm using does not render the data in the web browser. If you could help me fix this code.
Python (Flask) file:
from flask import Flask, render_template
from bs4 import BeautifulSoup as BS
import requests
src = requests.get('https://webscraper.netlify.app/').text
scraper = BS(src, 'lxml')
# head = scraper.find('main').select_one('article:nth-of-type(4)').div.text
# author = scraper.find('main').select_one('p').text
head = scraper.body.header.h1.text
snd_author = scraper.body.main.select_one('article:nth-of-type(2)').p.text
fst_article = scraper.body.main.article.div
app = Flask(__name__)
#app.route('/')
def index():
return render_template('index.html', **locals())
app.run(debug=True)
HTML (view) file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=2.0"/>
<title>Python Flask Web Scraper</title>
</head>
<body>
<!-- Python Flask Variables go here: -->
<h1> {{ head }} </h1>
<p>{{ snd_author }}</p>
<article>{{ fst_article }}</article>
</body>
</html>
You should instead use:
return render_template('index.html', head=head, snd_author=snd_author, fst_article=fst_article)

How can I access this type of site using requests? [duplicate]

This question already has answers here:
Scraper in Python gives "Access Denied"
(3 answers)
Closed 2 years ago.
This is the first time I've encountered a site where it wouldn't 'allow me access' to the webpage. I'm not sure why and I can't figure out how to scrape from this website.
My attempt:
import requests
from bs4 import BeautifulSoup
def html(url):
return BeautifulSoup(requests.get(url).content, "lxml")
url = "https://www.g2a.com/"
soup = html(url)
print(soup.prettify())
Output:
<html>
<head>
<title>
Access Denied
</title>
</head>
<body>
<h1>
Access Denied
</h1>
You don't have permission to access "http://www.g2a.com/" on this server.
<p>
Reference #18.4d24db17.1592006766.55d2bc1
</p>
</body>
</html>
I've looked into it for awhile now and I found that there is supposed to be some type of token [access, refresh, etc...].
Also, action="/search" but I wasn't sure what to do with just that.
This page needs to specify some HTTP headers to obtain the information (Accept-Language):
import requests
from bs4 import BeautifulSoup
headers = {'Accept-Language': 'en-US,en;q=0.5'}
def html(url):
return BeautifulSoup(requests.get(url, headers=headers).content, "lxml")
url = "https://www.g2a.com/"
soup = html(url)
print(soup.prettify())
Prints:
<!DOCTYPE html>
<html lang="en-us">
<head>
<link href="polyfill.g2a.com" rel="dns-prefetch"/>
<link href="images.g2a.com" rel="dns-prefetch"/>
<link href="id.g2a.com" rel="dns-prefetch"/>
<link href="plus.g2a.com" rel="dns-prefetch"/>
... and so on.

Write HTML received as string to browser

I have a basic HTML file which looks like this:
<!DOCTYPE html>
<html>
<head>
<title>Home</title>
</head>
<body>
<h1>Welcome!</h1>
</body>
</html>
I am receiving the file in python and storing it as a string. I want to know, is there a way I can write this out to a web browser?
The file is on my computer, so my goal is not to save it as an html file and then execute it, but rather execute this from within python to the browser.
I know that with JavaScript I can use Document.write() to inject content to a webpage, but that is already being done in the browser. I want to achieve something similar.
You can use flask, a simple Python web framework, to serve the string:
Using flask (pip install flask):
import flask
app = flask.Flask(__name__)
s = """
<!DOCTYPE html>
<html>
<head>
<title>Home</title>
</head>
<body>
<h1>Welcome!</h1>
</body>
</html>
"""
#app.route('/')
def home():
return s
if __name__ == '__main__':
app.debug = True
app.run()
Now, can you navigate to 127:0.0.1:5000 or the equivalent IP and port specified when the app is run.
You could do the following:
html = """<!DOCTYPE html>
<html>
<head>
<title>Home</title>
</head>
<body>
<h1>Welcome!</h1>
</body>
</html>"""
with open('html_file.html', 'w') as f:
f.write(html)
import webbrowser, os
webbrowser.open('file://' + os.path.realpath('html_file.html'))

keeping html entitles when using BeautifulSoup in python

I'm trying to keep html entitles while parsing an html page.
Here is the html code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div>{"key": "my "value"" }</div>
</body>
</html>
Heres my python (2.7.11) code
from bs4 import BeautifulSoup
page = open("test.html", "r").read()
soup = BeautifulSoup(page, "html.parser")
for
print soup.div
result will be <div>{"key": "my "value"" }</div>
for print soup.div.get_text() result will be {"key": "my "value"" }
For both cases I'm losing ". Is there any way to keep it while using BeautifulSoup, especially when using get_text()?
Because next I want to parse the text into json.
so when I use json.dumps(soup.div.get_text()) it's not working.

how to capture dynamically updated web element value(document.title) in python?

I am using bottle to host a simple html page which changes the page title on load.
The HTML page code:-
<html>
<head>
<title>title</title>
<script type="text/javascript">
function initialize(){
var z=1234;
document.title = z;}
</script>
</head>
<body onload="initialize();">
hi
</body>
</html>
My bottle hosting code:
from bottle import route, run, template
#route('/:anything')
def something(anything=''):
return template('C:/test1.html')
run(host='localhost', port=8080)
I am trying to capture the updated document.title using python.
so far I have tried urllib,mechanize,htmlparse but all of them were returning "title" instead of 1234.
a sample mechanize code that I have tried is:
from mechanize import Browser
br = Browser()
br.open("http://localhost:8080/hello")
print br.title()
Please help me.

Categories