I started to learn python recently and I want to convert existing html file to pdf file. It is very strange, but pdfkit seems to be the only lib for pdf docs for python.
import pdfkit
pdfkit.from_file("C:\\Users\\user\Desktop\\table.html", "out.pdf")
An error occurs:
OSError: No wkhtmltopdf executable found: "b''"
How to configure this lib properly on windows to make it work? I can't get it :(
It looks like you need to install wkhtmltopdf. For windows, the installer can be found at https://wkhtmltopdf.org/downloads.html
Also check out a post by this guy, who is having the same problem: Can't create pdf using python PDFKIT Error : " No wkhtmltopdf executable found:"
I found working solution.
If you want to convert files to pdf format just don't use python for this purpose.
You need to include DOMPDF library into your php script on your local/remove server. Something like this:
<?php
// include autoloader
require_once 'vendor/autoload.php';
// reference the Dompdf namespace
use Dompdf\Dompdf;
if (isset($_POST['html']) && !empty($_POST['html'])) {
// instantiate and use the dompdf class
$dompdf = new Dompdf();
$dompdf->loadHtml($_POST['html']);
// (Optional) Setup the paper size and orientation
$dompdf->setPaper('A4', 'landscape');
// Render the HTML as PDF
$dompdf->render();
// Output the generated PDF to Browser
$dompdf->stream();
} else {
exit();
}
Then in your python script you can post your html or whatever content to your server and get generated pdf file as a response. Something like this:
import requests
url = 'http://example.com/html2pdf.php'
html = '<h1>hello</h1>'
r = requests.post(url, data={'html': html}, stream=True)
f = open('converted.pdf', 'wb')
f.write(r.content)
f.close()
Related
I have to work with a fairly basic Python 2.6 on a 'black box' appliance (so no Django or non-standard libraries).
I have to:
Send a bunch of html from the browser to the Python script on the server
Do some processing and convert to pdf using wkhtmltopdf
Return the PDF to the browser
I use two Python scripts - makePDF and getPDF
At the end of makePDF I have a valid /tmp/xxx.pdf sitting on the server - I can transfer it by SCP, it opens without issue in acrobat - no problem there (it should always be under 100k - 2mb in size btw).
My problem is in sending the file back to the browser
here's getPDF
#!/usr/bin/python
from tempfile import *
tempfile=gettempdir()+"/xxx.pdf"
f = open(tempfile, 'r')
pdf = f.read()
f.close()
print 'Content-Type: application/pdf'
print pdf
It looks like it should be working - if I watch the http conversation in dev tools I can see that 169k of content length is returned, but it shows no response data, if use my weapon of choice, the 'Advanced Rest Client' chrome extn I see a response that contains what looks like a kosher pdf file:
%PDF-1.4
1 0 obj
<<
/Title (��Briefing Pack)
/Creator (��)
/Producer (��wkhtmltopdf)
/CreationDate (D:20131101095256+10'30')
>>
... etc
The browser, shows a "Failed to Load PDF Document" Error
I think it's fairly obvious that I'm an occasional Python user rather than a regular, so I suspect I'm missing something fairly basic...
It's working for me after adding a '\n' after application/pdf:
print "Content-type: application/pdf\n"
print pdf
I'm working on a project base on java. And the java program will run command to call a python script.
The python script is used tabula-py to read a pdf file and return the data.
I tried the python script was work when I direct call it in terminal (pytho3 xxx.py)
However, when I tried to call the python script from java, it will throw error:
Error from tabula-java:Error: File does not exist
Command '['java', '-Dfile.encoding=UTF8', '-jar', '/home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', 'all', '--lattice', '--guess', '--format', 'JSON', '/home/ubuntu/Documents/xxxx.pdf']' returned non-zero exit status 1.
I tried to call the script in full path, provide the pdf file in full path, tried sys.append(python script path) and both of them are not worked.
I've tried to call the tabula in java command, i.e. java -Dfile.encoding=UTF8 -jar /home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar "file_path"
And it's work and can read the file. However back to java to call the python script is not work
Is there any method to solve this? Use the tabula in java program is not an option for my case
Now that you mention that you mention you use java for base code and python for reading PDF, It's better of using java entirely for more efficient code. Why? Because there are tools already ready for you. There is absolutely no need for struggling to link one language to another.
code:
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
/**
* This class is used to read an existing
* pdf file using iText jar.
*/
public class PDFReadExample {
public static void main(String args[]){
try {
//Create PdfReader instance.
PdfReader pdfReader = new PdfReader("D:\\testFile.pdf");
//Get the number of pages in pdf.
int pages = pdfReader.getNumberOfPages();
//Iterate the pdf through pages.
for(int i=1; i<=pages; i++) {
//Extract the page content using PdfTextExtractor.
String pageContent =
PdfTextExtractor.getTextFromPage(pdfReader, i);
//Print the page content on console.
System.out.println("Content on Page "
+ i + ": " + pageContent);
}
//Close the PdfReader.
pdfReader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I am trying to send a file from a Python script to my .net core webserver.
In Python I am doing this using the requests library, and my code looks like so.
filePath = "run-1.csv"
with open(filePath, "rb") as postFile:
file_dict = {filePath: postFile}
response = requests.post(server_path + "/batchUpload/create", files=file_dict, verify=validate_sql)
print (response.text)
This code executes fine, and I can see the request fine in my webserver code which looks like so:
[HttpPost]
[Microsoft.AspNetCore.Authorization.AllowAnonymous]
public string Create(IFormFile file) //Dictionary<string, IFormFile>
{
var ms = new MemoryStream();
file.CopyTo(ms);
var text = Encoding.ASCII.GetString(ms.ToArray());
Debug.Print(text);
return "s";
}
However, the file parameter always returns as null.
Also, I can see the file parameter fine when getting data posted from postMan
I suspect that this problem has to do with how .net core model binding works, but not sure...
Any suggestions here on how to get my file displaying on the server?
Solved my issue - the problem was that in Python I was assigning my file to my upload dictionary with the actual file name "./run1.csv" rather than a literal string "file"
Updating this fixed my issue.
file_dict = {"file": postFile}
This is what I believe #nalnpir mentioned above.
I figured this out by posting from postman and also from my python code to http://httpbin.org/post and comparing the respoinse
The example from the requests docs is mostly correct, except that the key has to match the parameter of the controller method signature.
url = 'https://www.url.com/api/post'
files = {'parameterName': open('filename.extension', 'rb')}
r = requests.post(url, files=files)
So in this case the controller action should be
[HttpPost]
public string Post(IFormFile parameterName)
I'm trying in python2.7 with xmltodict ext. get data from app engine API (XML type).
Got no idea of how doing that...
I tried to do so with local XML (I download it from source url) with success
my local code look like this:
import xmltodict
document = open("my local path\API_GETDATA.xml", "r")
read_doc = document.read()
xml_doc = xmltodict.parse(read_doc)
for i in xml_doc:
print (xml_doc[i])
i=i+1
and my result is printing all XML fields.
How can I make it work on url? Is there any other thing I miss?
Use the python library requests:
Install with pip install requests and use like this:
import requests
r = requests.get("url")
xmltodict.parse(r.content)
I have just found wkhtmltopdf, amazing html converter using webkit. I have tried it on my dev machine and its simple and works well.
How can this best be integrated with a django based site?
I found the python bindings, but they presume a certain level of understanding of how to install things I just don't have. e.g.
you need libwkhtmltox.* somewhere in your LD path (/usr/local/lib)
you need the directory src/include/wkhtmltox from wkhtmltopdf
somewhere on your include path (/usr/local/include)
After installing those python bindings, how do I use them? What calls can I do?
Does the resulting pdf have to be saved to the hd or can I stream it out of a view with something?
For example:
response['Content-Disposition'] = 'attachment; filename='+letter_name
response['Content-Type'] = 'Content-type: application/octet-stream'
response['Content-Length'] = bytes
return response
I would recommend django-wkhtmltopdf for this purpose. Their usage documentation gives a few examples on how to integrate:
from django.conf.urls.defaults import *
from wkhtmltopdf.views import PDFTemplateView
urlpatterns = patterns('',
# ...
url(r'^pdf/$', PDFTemplateView.as_view(template_name='my_template.html',
filename='my_pdf.pdf'), name='pdf'),
# ...
)