How to Convert a Local HTML File to PDF File in Python

How to Convert a Local HTML File to PDF File in Python - python

I have searched some method to convert my local html files to pdf file in python.
What I searched was pdfkit, weasyprint, xhtml2pdf, pdfcrowd.
The problem is, I should make this as an exe file with Pyinstaller to give someone else so that they can work with this program. And I encountered an OSerror with pdfkit which is saying:
OSError: No wkhtmltopdf executable found: "b''"
And I found the solution that I have to edit some environment variables for that.
Weasyprint also need to download/install other things.
I don't think those will work in other PC since they need some external treatment to run the program.
xhtml2pdf seems be the one which convert html from the webpage(not the local file) to pdf, and pdfcrowd is my least option for now since I have to pay to use the API.
Do you have any recommendations for converting those with my circumstances?

Do you actually need to provide a python package thing? Because frankly any modern browser should be able to print to a PDF, and that's significantly simpler than anything else if it suffices: have "someone" open the HTML page in their browser, print it, and select whatever option prints to PDF (save as PDF in chrome, print to file in firefox, ...)

Related

How to serve PDF files on the web?

I have the following code, which seems to serve a PDF without any content:
from pathlib import Path
pdf = Path("url/to/file.pdf")
print(f"Content-Type: application/pdf;\r\n")
print(pdf.read_bytes())
Any tips to correctly serve this PDF would be helpful!
Edit: for context, I am trying to serve PDF files and obscure original PDF file path on the server.

I'm no Python developer so I can't help you too specifically, but a couple things...
If your Phython script is outputting headers and content in the same response (such as via CGI), you need to have a blank line between the headers and content. Right now you have one \r\n. Add a second \r\n.
The other thing is that you should find a way to stream the output from that file rather than reading all its bytes and printing them.
Finally, I don't know if print() in Python is interpreting that as a string, but that can be problematic for binary data. This again is solved by piping a stream directly to the output.

Figured out the answer to my own question if anyone else needs to know:
pdf = Path("url/to/file.pdf")
print("Content-type: application/pdf\r\n\r\n")
stdout.flush()
stdout.buffer.write(pdf.read_bytes())

I realy dont know what your question is about. Its looks like code is unrelated with question. Its seems like its suppose to find pdf in local file system(build_in open?) and then print its content(?). Do you use some framework(flask/django)?
If by dynamic serving you have in mind dynamic creation of pdf based on some template:
pdf may be constructed from some markdown language like html, tex file(latex unfortunately is complicated system and dont fit to be depoloyed with web app)
markdown language file may be in turn rendered by template soft (jinja2, django build_in)
https://weasyprint.org/ is library that convert html + css to pdf
Ps. add more context

How do I open/convert .pkz files?

A python package that I'm using has data stored under a single file with a .pkz extension. How would I unzip (?) this file to view the format of data within?

Looks like what you are referencing is just a one-off file format used in sample data in scikit-learn. The .pkz is just a compressed version of a Python pickle file which usually has the extension .pkl.
Specifically you can see this in one of their sample files here along with the fact they are using the zlib_codec. To open it, you can go in reverse or try uncompressing from the command line.

Before attempting to open an PKZ file, you'll need to determine what kind of file you are dealing with and whether it is even possible to open or view the file format.
Files which are given the .PKZ extension are known as Winoncd Images Mask files, however other file types may also use this extension. If you are aware of any additional file formats that use the PKZ extension, please let us know.
How to open a PKZ file:
The best way to open an PKZ file is to simply double-click it and let the default assoisated application open the file. If you are unable to open the file this way, it may be because you do not have the correct application associated with the extension to view or edit the PKZ file.
If you can do it, great, you have a program installed that can do it, lets say that program is called pkzexecutor.exe, with python, you just have to do:
import subprocess
import os
path_to_notepad = 'C:\\Windows\\System32\\pkzexecutor.exe'
path_to_file = 'C:\\Users\\Desktop\\yourfile.pkz'
subprocess.call([path_to_notepad, path_to_file])

From the source code for fetch_olivetti_faces, the file appears to be downloaded from http://cs.nyu.edu/~roweis/data/ and originally has a .mat file extension, meaning it is actually a MATLAB file. If you have access to MATLAB or another program which can read those files, try opening it from there with the original file extension and see what that gives you.
(If you want to try opening this file in Python itself, then perhaps give this question a look: Read .mat files in Python )

Differences between rendering HTML in PyCharm and a text editor (Sublime Text)

I've got exactly the same files (HTML + CSS), in both PyCharm and Sublime Text, and the results of rendering these in Google Chrome is completely different.
Editing CSS doesn't have any affect on the results of rendering the HTML.
I have to make the project using Python Flas, but I want to start from HTML and CSS.
Does anybody know why have I different results from the same files?

Let me try a lucky guess since I don't know what is exactly rendering different: it could be the Encoding of the file, you can try and change in Sublime selecting a different enconding type to save the file to match the file saved in pycharm.
File>Save with encoding>[select]
If both are completely equal is the only thing that I can imagine.

When we run PyCharm project it give us the same link and we have to clear cache or cookies every time we open this link

how to read ppt file using python?

I want to get the content (text only) in a ppt file. How to do it?
(It likes that if I want to get content in a txt file, I just need to open and read. What do I need to do to get information from ppt files?)
By the way, I know there is a win32com in windows system. But now I am working on linux, is there any possible way?

I found this discussion over on Superuser:
Command line tool in Linux to Extract Text From Word, Excel, Powerpoint?
There are several reasonable answers listed there, including using LibreOffice to do this (and for .doc, .docx, .pptx, etc, etc.), and the Apache Tika Project (which appears to be the 5,000lb gorilla in this solution space).

printing to windows printer with python or shell command

I'm tryin to script an annoying task that involves fetching, handling and printing loads of scanned docs - jpeg or pdf. I don't succeed in accessing the printer from python or from windows shell (which I could script with python subproccess module). I succeeded in printing a text file from the command line with lpr command, but not jpg or pdf.
be glad for any clues about that, including a more extensive win shell reference for printing to printer, a suitable python library I missed in my google search stackoverflow search etc (just one unanswered question)

Well, after a little research I found some links that might help you:
1) To print images using Python Shell, this link below has some code using PIL that will, hopefully, do what you want:
http://timgolden.me.uk/python/win32_how_do_i/print.html
2) To print PDF files, this link may have what you need:
http://www.darkcoding.net/software/printing-word-and-pdf-files-from-python/
I never did any of those things, but with a quick look, I could find this links and they seem to make very much sense. Hope it helps :)

I used this for a rtf (just an idea) :
subprocess.call(['loffice', '-pt', 'LaserJet', file])
I am using LibreOffice. it can print in a batch mode.

with a default pdf viewer assigned to the system you can do
import win32api
fname="C:\\somePDF.pdf"
win32api.ShellExecute(0, "print", fname, None, ".", 0)
note that this will only work on windows and will not work with all pdf viewers but it should be good with acrobat and Foxit and several other major ones.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Convert a Local HTML File to PDF File in Python - python

Related

How to serve PDF files on the web?

How do I open/convert .pkz files?

Differences between rendering HTML in PyCharm and a text editor (Sublime Text)

how to read ppt file using python?

printing to windows printer with python or shell command

Categories

Resources