HTML to PDF with python and wkhtmltopdf

HTML to PDF with python and wkhtmltopdf - python

I have just found wkhtmltopdf, amazing html converter using webkit. I have tried it on my dev machine and its simple and works well.
How can this best be integrated with a django based site?
I found the python bindings, but they presume a certain level of understanding of how to install things I just don't have. e.g.
you need libwkhtmltox.* somewhere in your LD path (/usr/local/lib)
you need the directory src/include/wkhtmltox from wkhtmltopdf
somewhere on your include path (/usr/local/include)
After installing those python bindings, how do I use them? What calls can I do?
Does the resulting pdf have to be saved to the hd or can I stream it out of a view with something?
For example:
response['Content-Disposition'] = 'attachment; filename='+letter_name
response['Content-Type'] = 'Content-type: application/octet-stream'
response['Content-Length'] = bytes
return response

I would recommend django-wkhtmltopdf for this purpose. Their usage documentation gives a few examples on how to integrate:
from django.conf.urls.defaults import *
from wkhtmltopdf.views import PDFTemplateView
urlpatterns = patterns('',
# ...
url(r'^pdf/$', PDFTemplateView.as_view(template_name='my_template.html',
filename='my_pdf.pdf'), name='pdf'),
# ...
)

Related

Is it always correct to use URLs like "./about.html" or "../about.htm" instead of Absolute URLS like /about?

I'm a computer science student. Recently we were tasked to develop a static HTTP server from scratch without using any HTTP modules, solely depending on socket programming. So this means that I had to write all the logic for HTTP message parsing, extracting headers, parsing URLs, etc.
However, I'm stuck with some confusion. As I'm somewhat experienced in web development before, I'm used to using URLs in places like anchor tags like this "/about", and "/articles/article-1".However, I've seen people sometimes people to relative paths according to their folder structure like this. "./about.html", "../contact.html".This always seemed to be a bad idea to me. However, I realized that even though in my code I'm not supporting these kinds of URLs explicitly, it seems to work anyhow.
Following is the python code I'm using to get the path from the HTTP message and then get the corresponding path in the file system.
def get_http_url(self, raw_request_headers: list[str]):
"""
Method to get HTTP url by parsing request headers
"""
if len(raw_request_headers) > 0:
method_and_path_header = raw_request_headers[0]
method_and_path_header_segments = method_and_path_header.split(" ")
if len(method_and_path_header_segments) >= 2:
"""
example: GET / HTTP/1.1 => ['GET', '/', 'HTTP/1.1] => '/'
"""
url = method_and_path_header_segments[1]
return url
return False
def get_resource_path_for_url(self, path: str | Literal[False]):
"""
Method to get the resource path based on url
"""
if not path:
return False
else:
if path.endswith('/'):
# Removing trailing '/' to make it easy to parse the url
path = path[0:-1]
# Split to see if the url also includes the file extension
parts = path.split('.')
if path == '':
# if the requested path is "/"
path_to_resource = os.path.join(
os.getcwd(), "htdocs", "index.html")
else:
# Assumes the user entered a valid url with resources file extension as well, ex: http://localhost:2728/pages/about.html
if len(parts) > 1:
path_to_resource = os.path.join(
os.getcwd(), "htdocs", path[1:]) # Get the abslute path with the existing file extension
else:
# Assumes user requested a url without an extension and as such is hoping for a html response
path_to_resource = os.path.join(
os.getcwd(), "htdocs", f"{path[1:]}.html") # Get the absolute path to the corresponding html file
return path_to_resource
So in my code, I'm not explicitly adding any logic to handle that kind of relative path. But somehow, when I use things like ../about.html in my test HTML files, it somehow works?
Is this the expected behavior? As of now (I would like to know where this behavior is implemented), I'm on Windows if that matters. And if this is expected, can I depend on this behavior and conclude that it's safe to refer to HTML files and other assets with relative paths like this on my web server?
Thanks in advance for any help, and I apologize if my question is not clear or well-formed.

Python configure pdfkit on windows

I started to learn python recently and I want to convert existing html file to pdf file. It is very strange, but pdfkit seems to be the only lib for pdf docs for python.
import pdfkit
pdfkit.from_file("C:\\Users\\user\Desktop\\table.html", "out.pdf")
An error occurs:
OSError: No wkhtmltopdf executable found: "b''"
How to configure this lib properly on windows to make it work? I can't get it :(

It looks like you need to install wkhtmltopdf. For windows, the installer can be found at https://wkhtmltopdf.org/downloads.html
Also check out a post by this guy, who is having the same problem: Can't create pdf using python PDFKIT Error : " No wkhtmltopdf executable found:"

I found working solution.
If you want to convert files to pdf format just don't use python for this purpose.
You need to include DOMPDF library into your php script on your local/remove server. Something like this:
<?php
// include autoloader
require_once 'vendor/autoload.php';
// reference the Dompdf namespace
use Dompdf\Dompdf;
if (isset($_POST['html']) && !empty($_POST['html'])) {
// instantiate and use the dompdf class
$dompdf = new Dompdf();
$dompdf->loadHtml($_POST['html']);
// (Optional) Setup the paper size and orientation
$dompdf->setPaper('A4', 'landscape');
// Render the HTML as PDF
$dompdf->render();
// Output the generated PDF to Browser
$dompdf->stream();
} else {
exit();
}
Then in your python script you can post your html or whatever content to your server and get generated pdf file as a response. Something like this:
import requests
url = 'http://example.com/html2pdf.php'
html = '<h1>hello</h1>'
r = requests.post(url, data={'html': html}, stream=True)
f = open('converted.pdf', 'wb')
f.write(r.content)
f.close()

Flask-Uploads URL is always a 404

I'm using Flask-Uploads to upload a file and output the URL. However, for some reason the URL always points to a 404! I can see the file in the correct folder, but the URL seems to not be able to find it. Here are the configurations I'm using...
UPLOADS_DEFAULT_URL = os.environ.get("UPLOADS_URL", "http://localhost:5000/")
UPLOADS_DEFAULT_DEST = "app/uploads"
I also have an Upload set defined as:
productUploadSet = UploadSet('plist', extensions=('xlsx', 'csv', 'xls'))
The file is found in "app/uploads/plist/filename.csv" and I'll get a url returned to me like "http://localhost:5000/productlist/filename.csv" but whenever I open the URL it is always a 404. I know that the url method from Flask-Uploads doesn't actually check fi the file exists, but I can see the file is actually there. Is it looking in the wrong place somehow? Thanks for any help.

From Flask's guide to uploading files:
Now one last thing is missing: the serving of the uploaded files. As of Flask 0.5 we can use a function that does that for us:
from flask import send_from_directory
#app.route('/uploads/<filename>')
def uploaded_file(filename):
return send_from_directory(app.config['UPLOAD_FOLDER'],
filename)
Alternatively you can register uploaded_file as build_only rule and use the SharedDataMiddleware. This also works with older versions of Flask:
from werkzeug import SharedDataMiddleware
app.add_url_rule('/uploads/<filename>', 'uploaded_file',
build_only=True)
app.wsgi_app = SharedDataMiddleware(app.wsgi_app, {
'/uploads': app.config['UPLOAD_FOLDER']
})
Without implementing one of these, Flask has no idea how to serve the uploaded files.

Django - consume XML - RESTful

I have a python script running fine on my localhost. Its not an enterprise app or anything, just something I'm playing around with. It uses the "bottle" library. The app basically consumes an XML file (stored either locally or online) which contains elements with their own unique IDs, as well as some coordinates, eg mysite.com/23 will bring back the lat/long of element 23. I'm sure you're all familiar with REST at this stage anyway.
Now, I want to put this online, but have had trouble finding a host that supports "bottle". I have, however, found a host that has django installed.
So, my question is, how hard would it be to convert the following code from bottle to django? And can someone give me some pointers? I've tried to use common python libraries.
thanks.
from xml.dom.minidom import parseString
from bottle import route, run
import xml
import urllib
file = open('myfile.xml','r')
data = file.read()
dom = parseString(data)
#route('/:number')
def index(number="1"):
rows = dom.getElementsByTagName("card")[0].getElementsByTagName("markers")[0].getElementsByTagName("marker")
for row in rows:
if row.getAttribute("number") == str(number):
return str(xml.dumps({'long': row.getAttribute("lng"), 'lat': row.getAttribute("lat")}, sort_keys=True, indent=4))
return "Not Found"
run(host='localhost', port=8080)

I took your question as an opportunity to learn a bit more about Django. I used The Django Book as a reference.
Starting with an empty Django site (django-admin.py startproject testsite), I've changed urls.py to this:
from django.conf.urls.defaults import patterns, include, url
from testsite.views import index
urlpatterns = patterns('',
url(r'^(\d+)$', index),
)
And views.py to this:
from django.http import HttpResponse
from xml.dom.minidom import parseString
import xml
import urllib
def index(request, number):
data = open('myfile.xml', 'r').read()
dom = parseString(data)
rows = (dom.getElementsByTagName("card")[0]
.getElementsByTagName("markers")[0]
.getElementsByTagName("marker"))
for row in rows:
if row.getAttribute("number") == str(number):
return HttpResponse(str(xml.dumps({'long': row.getAttribute("lng"),
'lat': row.getAttribute("lat")}, sort_keys=True, indent=4)))
return HttpResponse("Not Found")
Caveat: I've not tested the XML code, only Django-related one, which I've tested via python manage.py runserver.
The Django Book contains a lot of information, including how to deploy this on a production server.

Read a file on App Engine with Python?

Is it possible to open a file on GAE just to read its contents and get the last modified tag?
I get a IOError: [Errno 13] file not accessible:
I know that i cannot delete or update but i believe reading should be possible
Has anyone faced a similar problem?
os.stat(f,'r').st_mtim

You've probably declared the file as static in app.yaml. Static files are not available to your application; if you need to serve them both as static files and read them as application files, you'll need to include 2 copies in your project (ideally using symlinks, so you don't actually have to maintain an actual copy.)
Update Nov 2014:
As suggested in the comments, you can now do this with the application_readable flag:
application_readable
Optional. By default, files declared in static file handlers are
uploaded as static data and are only served to end users, they cannot
be read by an application. If this field is set to true, the files are
also uploaded as code data so your application can read them. Both
uploads are charged against your code and static data storage resource
quotas.
See https://cloud.google.com/appengine/docs/python/config/appconfig#Static_Directory_Handlers

You can read files, but they're on Goooogle's wacky GAE filesystem so you have to use a relative path. I just whipped up a quick app with a main.py file and test.txt in the same folder. Don't forget the 'e' on st_mtime.
import os
from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
class MainHandler(webapp.RequestHandler):
def get(self):
path = os.path.join(os.path.split(__file__)[0], 'test.txt')
self.response.out.write(os.stat(path).st_mtime)
def main():
application = webapp.WSGIApplication([('/', MainHandler)],
debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()

+1 for the new "application_readable: true" feature. Before using this new feature I did run into an issue with GAEs' "wacky" file system while getting the NLP Montylingua to import.
Issue: Monty uses the open(filename,'rb') and a file pointer to file_ptr.read() in bytes from the static files. My implementation worked on my local windows system but failed upon deployment!
The fix: Specify the expected bytes to read file_ptr.read(4) #4 binary bytes
Appears to be something related to the 64 bit GAE server wanting to read in more (8 by default) bytes. Anyways, took a while to find that issue. Montylingua loads now.

I came up strange but working solution :) Jinja :)
Serving static files directly sometimes become a headache with GAE. Possible trade-off from performance let you move straigh forward with Jinja
- url: /posts/(.*\.(md|mdown|markdown))
mime_type: text/plain
static_files: static/posts/\1
upload: posts/(.*\.(md|mdown|markdown))
from jinja2 import Environment
from jinja2.loaders import FileSystemLoader
posts = Environment(loader=FileSystemLoader('static/posts/')) # Note that we use static_files folder defined in app.yaml
post = posts.get_template('2013-11-13.markdown')
import markdown2 # Does not need of course
class Main(webapp2.RequestHandler):
def get ( self ):
self.response.headers[ 'Content-Type' ] = 'text/html'
self.response.write ( markdown2.markdown( post.render()) ) # Jinja + Markdown Render function
Did you get it ;) I tested and It worked.

With webapp2, supposing you have pages/index.html at the same path as main.py:
#!/usr/bin/env python
import webapp2, os
class MainHandler(webapp2.RequestHandler):
def get(self):
path = os.path.join(os.path.split(__file__)[0], 'pages/index.html')
with open(path, 'r') as f:
page_content = f.read()
self.response.write(page_content)
app = webapp2.WSGIApplication([
('/', MainHandler)
], debug=True)

I can't see an answer for when the file hasn't been marked as static, and you're trying to read it in mode 'rt'; apparently that doesn't work. You can however open files just fine in mode 'rb', or just plain 'r'. (I wasted about 10 minutes on that 't'.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

HTML to PDF with python and wkhtmltopdf - python

Related

Is it always correct to use URLs like "./about.html" or "../about.htm" instead of Absolute URLS like /about?

Python configure pdfkit on windows

Flask-Uploads URL is always a 404

Django - consume XML - RESTful

Read a file on App Engine with Python?

Categories

Resources