Insert variables into an html file - python

I am trying to send an email using sendgrid. For this I created an html file and I want to format some variables into it.
Very basic test.html example:
<html>
<head>
</head>
<body>
Hello World, {name}!
</body>
</html>
Now in my Python code I am trying to do something like this:
html = open("test.html", "r")
text = html.read()
msg = MIMEText(text, "html")
msg['name'] = 'Boris'
and then proceed to send the email
Sadly, this does not seem to be working. Any way to make this work?

There are a few ways to approach this depending on how dynamic this must be and how many elements you are going to need to insert. If it is one single value name then #furas is correct and you can simply put
html = open("test.html", "r")
text = html.read().format(name="skeletor")
print(text)
And get:
<html>
<head>
</head>
<body>
Hello World, skeletor!
</body>
</html>
Alternatively you can use Jinja2 templates.
import jinja2
html = open("test.html", "r")
text = html.read()
t = jinja2.Template(text)
print(t.render(name="Skeletor"))
Helpful links: Jinja website
Real Python Primer on Jinja
Python Programming Jinja

Related

Insert multiple lines of hyperlinks in HTML by Python

To display multiple lines in html body, simple codes:
websites = ["https://www.reddit.com/","https://en.wikipedia.org/","https://www.facebook.com/"]
html = """
<!DOCTYPE html>
<html>
<body>
<h1>Hi, friend</h1>
<p>$websites!</p>
</body>
</html>
"""
html = Template(html).safe_substitute(websites = "<p>".join(websites))
Now I want to change the links to hyperlinks with friendly names.
names = ["News", "Info", "Media"]
Changed the line to:
<p><a href=$websites>$names</a></p>
and:
html = Template(html).safe_substitute(websites = "<p>".join(websites),
names= "<p>".join(names))
What I want in the html to show is:
News
Info
Media
But it doesn't show properly.
What's the right way to do that? Thank you.
Don't do '<p>'.join(websites). This will create a string by joining all the elements of a list and stick the '<p>' between them.
so that gives you "https://www.reddit.com/<p>https://en.wikipedia.org/"<p>https://www.facebook.com/" which is not what you want (I don't think It's valid as well).
You don't have any <a> link tags. So you need to Create those.
The href will point to the website and inside the <a> tag you have the name you want to appear
<a href={link}>{link_name}</a>
This is what you want to do:
websites = ["https://www.reddit.com/","https://en.wikipedia.org/","https://www.facebook.com/"]
html = """
<!DOCTYPE html>
<html>
<body>
<p>$websites</p>
</body>
</html>
"""
tag_names = ['News', 'Info', 'Media']
a_links = '<br/>'.join([f'<a href={link}>{link_name}</a>' for link, link_name in zip(websites, tag_names)])
html = Template(html).safe_substitute(websites=a_links)

Getting Xpath from plain text

Im trying to get xpath from text instead of a URL. But i keep getting the error "AttributeError: 'HtmlElement' object has no attribute 'XPath'"
see code below.
from lxml import html
var ='''<html lang="en">
<head>
<title>Selecting content on a web page with XPath</title>
</head>
<body>
This is the body
</body>
</html>
'''
tree = html.fromstring(var)
body = tree.XPath('//*/body')
print(body)
It has been 15 years since I last used Python, but as far as I can tell, it is a case-sensitive language, and the xpath method is all lowercase.
So try this:
body = tree.xpath('//*/body')

Parsing MS specific html tags in BeautifulSoup

When trying to parse an email sent using MS Outlook, I want to be able to strip the annoying Microsoft XML tags that it has added. One such example is the o:p tag. When trying to use Python's BeautifulSoup to parse an email as HTML, it can't seem to find these specialty tags.
For example:
from bs4 import BeautifulSoup
textToParse = """
<html>
<head>
<title>Something to parse</title>
</head>
<body>
<p><o:p>This should go</o:p>Paragraph</p>
</body>
</html>
"""
soup = BeautifulSoup(textToParse, "html5lib")
body = soup.find('body')
for otag in body.find_all('o'):
print(otag)
for otag in body.find_all('o:p'):
print(otag)
This will output no text to the console, but if I switched the find_all call to search for p then it would output the p node as expected.
How come these custom tags do not seem to work?
It's a namespace issue. Apparently, BeautifulSoup does not consider custom namespaces valid when parsed with "html5lib".
You can work around this with a regular expression, which – strangely – does work correctly!
print (soup.find_all(re.compile('o:p')))
>>> [<o:p>This should go</o:p>]
but the "proper" solution is to change the parser to "lxml-xml" and introducing o: as a valid namespace.
from bs4 import BeautifulSoup
textToParse = """
<html xmlns:o='dummy_url'>
<head>
<title>Something to parse</title>
</head>
<body>
<p><o:p>This should go</o:p>Paragraph</p>
</body>
</html>
"""
soup = BeautifulSoup(textToParse, "lxml-xml")
body = soup.find('body')
print ('this should find nothing')
for otag in body.find_all('o'):
print(otag)
print ('this should find o:p')
for otag in body.find_all('o:p'):
print(otag)
>>>
this should find nothing
this should find o:p
<o:p>This should go</o:p>

Using Python - Get a table out of some html and display it?

There's a lot of help on here but some of it goes over my head, so hopefully by asking my question and getting a tailored answer I will better understand.
So far I have managed to connect to a website, authenticate as a user, fill in a form and then pull down the html. The html contains a table I want. I just want to say some thing like:-
read html... when you read table start tags keep going until you reach table end tags and then disply that, or write it to a new html file and open it keeping the tags so it's formmated for me.
Here is the code I have so far.
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
s.post(LOGINURL, data=login)
# print
r = s.get(LOGINURL)
print r.url
# An authorised request.
r = s.get(APURL)
print r.url
# etc...
s.post(APURL)
#
r = s.post(APURL, data=findaps)
r = s.get(APURL)
#print r.text
f = open("makethisfile.html", "w")
f.write('\n'.join(['<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">',
'<html>',
' <head>',
' <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">',
' <title>THE TITLE</title>',
' <link rel="stylesheet" href="css/displayEventLists.css" type="text/css">',
r.text #this just does everything, i need to get the table.
])
)
f.close()
Although it's best to parse the file properly, a quick-and-dirty method uses a regex.
m = re.search("<table.*?>(.+)</table>", r.text, re.S)
if (m):
print m.group()
else:
print "Error: table not found"
As an example of why parsing is better, the regex as written will fail with the following (rather contrived!) example:
<!-- <table> -->
blah
blah
<table>
this is the actual
table
</table>
And as written it will get the first table in the file. But you could just loop to get the 2nd, etc., (or make the regex specific to the table you want if possible) so that's not a problem.

Django output word files(.doc),only show raw html in the contents

I am writing a web app using Django 1.4.I want one of my view to output mirosoft word docs using the follwoing codes:
response = HttpResponse(view_data, content_type='application/vnd.ms-word')
response['Content-Disposition'] = 'attachment; filename=file.doc'
return response
Then ,I can download the file.doc successfully ,but when I open the .doc file ,I only find the raw html like this
<h1>some contents</h1>
not a heading1 title.
I am new to python & Django ,I know this maybe some problems with html escape,can some one please help me with this ?
Thank you !:)
Unless you have some method of converting your response (here HTML I assume) to a .doc file, all you will get is a text file containing your response with the extension .doc. If you are willing to go for .docx files there is a wonderful python library called python-docx you should look in to that allows you to generate well formed docx files using the lxml library.
Alternatively, use a template such as:
<html>
<head>
<META HTTP-EQUIV=""Content-Type"" CONTENT=""text/html; charset=UTF-8"">
<meta name=ProgId content=Word.Document>
<meta name=Generator content=""Microsoft Word 9"">
<meta name=Originator content=""Microsoft Word 9"">
<style>
#page Section1 {size:595.45pt 841.7pt; margin:1.0in 1.25in 1.0in 1.25in;mso-header-margin:.5in;mso-footer-margin:.5in;mso-paper-source:0;}
div.Section1 {page:Section1;}
#page Section2 {size:841.7pt 595.45pt;mso-page-orientation:landscape;margin:1.25in 1.0in 1.25in 1.0in;mso-header-margin:.5in;mso-footer-margin:.5in;mso-paper-source:0;}
div.Section2 {page:Section2;}
</style>
</head>
<body>
<div class=Section2>
'Section1: Portrait, Section2: Landscape
[your text here]
</div>
</body>
</html>
This should, according to this asp.net forum post make a valid .doc file when returned as mime type application/msword using UTF-8 charset (so make sure strings are all unicode).

Categories