How do you make a PDF searchable with text in the sidebar? - python

I'm looking to create some PDF's from Python.
I've noticed that some pdf's have sidebar text that allows you to see the context of occurrences of search terms.
e.g. search for "dictionary"
View in Sidebar:
Page 10 Assigning a value to an existing dictionary key simply replaces the old value with a new one.
How is that done?
Is there anyway to convert existing PDFs to render this sidebar text?

If you use Reportlab to generate your pdfs, then there are facilities in the library to bookmark as you want. Checkout the bookmarkPage method on page 54 of the documentation.

I believe what you're referring to are bookmarks. The first hit on Google indicates that you can put them in by hand with Acrobat Pro.
The DocBook XSL templates when used with Apache FOP

The PyQt gui toolkit has support for creating PDF's. See for example: Printing Rich Text with Qt

Related

How to embed an XLSX local file into HTML page with Python and Django

For a Python web project (with Django) I developed a tool that generates an XLSX file. For questions of ergonomics and ease for users I would like to integrate this excel on my HTML page.
So I first thought of converting the XLSX to an HTML array, with the xlsx2html python library. It works but since I can’t determine the desired size for my cells or trim the content during conversion, I end up with huge cells and tiny text..
I found an interesting way with the html tag associated with OneDrive to embed an excel window into a web page, but my file being in my code and not on Excel Online I cannot import it like that. Yet the display is perfect and I don’t need the user to interact with this table.
I have searched a lot for other methods but apart from developing a function to browse my file and generate the script of the html table line by line, I have the feeling that I cannot simply use a method to convert or display it on my web page.
I am not accustomed to this need and wonder if there would not be a cleaner method to display an excel file in html.
Does it make sense to develop a function that builds my html table script in str? Or should I find a library that does it? Maybe there is a specific Django library ?
Thank you for your experience

report builder for django

is this something like stimulsoft or crystal report for django, i am not talking about report viewer that just export some excel data, i am talking about whole package, like some text with variables and some tables, pages with headers and footers and water marks and so on.
i want to have footer on every page and tables that i don't know how
long they will grow and maybe they go to second page or third and the
page must be generated with footer for new data just like stimulsoft
reporter
You can use Reportlab that contains such features. [read it!]. But I don't found a full package to connect models and making reports. In Reportlab you can make page templates and complete them with data. For the Persian language, you should use external packages for RTL reshaping.
Check out ReportBro
Commercial use requires a license. I'm not affiliated, but currently evaluating for use in my own project. It seems to offer everything you're looking for.

Python pisa/xhtml2pdf messy rendering

In a django project, I want to generate an html page from a view and convert the html/css generated to pdf. I am using xhtml2pdf for this (https://github.com/chrisglass/xhtml2pdf/blob/master/doc/usage.rst#using-xhtml2pdf-in-django).
Browser -> django view -> mysql DB -> django template -> html/css -> pdf
I have made sure that:
I am using a function (link_callbak) to convert all relative paths to a proper absolute ones so xhtml2pdf is able to retrieve all the images needed.
Instead of relying on a tag to include the CSS (which does not work) I have directly used #import function with an absolute path to the css file. (CSS not rendered by Pisa's pdf generation in Django)
The css file is taken into account as I find some style element in the output howver the pdf generated is very different from the html output. Images are all messed up (partly visible and partly just outside the document), forms are not respected, font size is not correct, <ul> are not properly rendered. Moreover, I had to remove a -moz-placeholder tag from the CSS as it was not properly handeled by xhtml2pdf.
Is there known issues of CSS interpretation with xhtml2pdf ? Is there restrictions ?
I already spent a lot of time customizing the CSS file to make it work on Chrome/Firefox and IE7, and I don't want to spend another round on adapting it for xhtml2pdf. Is there a reliable solution to convert an html/CSS templated through django to pdf ? Even a special type of link to call the 'print pdf' function of the browser would do...
And no, I don't want to use ReportLab and draw squares and circles, thank you !

Exclude first page from Table of Contents in pisa / xhtml2pdf

I'm using django-xhtml2pdf to generate a report. I'm using the first page as a cover sheet, followed by the table of contents, using the <pdf:toc /> tag.
I would like to discount the first page, so the page-numbering in the Table of Contents starts at 1 instead of 2.
Is this possible?
Reading through the xhtml2pdf code, there isn't support for offsetting the page numbering. There's an old discussion about a pisa fork trying to implement support for this, but I'm not sure how far it got.
An awkward but straight-forward solution is to generate your cover sheet and the rest of the document as separate PDFs and then merge them. That way the page numbering will exclude the cover sheet. pyPDF merging and displaying as httpresponse through django has an accepted answer that will let you do just that.

Does Django have a template tag that can detect URLs and turn them into hyperlinks?

When someone writes a post and copies and pastes a url in it, can Django detect it and render it as a hyperlink rather than plain text?
Django has the urlize template filter which will automatically detect both URLs and email addresses and turn them into the appropriate hyperlinks.
The docs there are actually a little thin, so I recommend also reading the docstring in the source for the urlize function for more information.
urlize:
http://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#urlize
Another option is to parse plain text in some way, for example as reStructuredText (my favourite) or Markdown (Stack Overflow uses a slightly modified variant of Markdown). These will both turn valid plain text links targets into hyperlinks. This also gives you more power over what you can do; you won't need to resort to HTML to achieve some basic formatting. Note also as stated with urlize that you should only use it on plain text; it's not designed to be mixed with HTML.

Categories