I have xml configs which are very complex. They are validated using dtds. I am looking for some application which reads the dtds and provides a GUI interface to write xmls. So that we don't have to write xmls by hand. If there is nothing existing already, how to start with developing one?
did you try Eclipse?
You can edit XML files according to the DTD.
If your audience is not development oriented, you can have a look to Editix XML Editor (the free version is already fully usable) or Notepad++ (a strange mix between the classic windows notepad and eclipse ;) ).
Otherwise, if you want to write come code, have a look to wxPython there are some widgets to edit/view HTML and XML.
Take a look at oXygen XML Author. It's not intended for programmers/developers like oXygen XML Editor is. It's a WYSIWYG XML authoring tool. You can create CSS stylesheets to display the XML in a way that is easy for the authors to create/modify the XML.
It's not free and you'll have to create CSS for display, but it is much better than trying to develop something from scratch. It also supports creating templates so that authors can start with a base XML file and then modify/update.
Related
I am looking for a way to convert HTML text to RTF string. Is there any libraries that does this job. I get html content dynamically in my project and need it to be rendered in RTF format. I am using HTML parser to convert HTML text to normal string and then have trying to use PyRTF for conversion to RTF format. Is there any better way that this can be done.Thanks in advance.
RTF seems a dicey format to convert from/to. I've tried cutting and pasting among applications on Mac OS X, for example, where RTF is something of a lingua franca. Some of those apps are Microsoft apps (relevant in that RTF is a Microsoft-developed format), others are not. Even basic formatting information like font size, font face, line spacing, and list styling (ordered or unordered) is jumbled when copying from one ostensibly RTF-speaking app to another. Simply put, it's a mess.
I have searched for ways to programmatically read, write, and transform RTF, preferably from Python. I found a number of packages on PyPI, trying them out has been a disappointing experience. They would support RTF 1.5, say, when the current version is 1.9.1. RTF has been around a long time, but a 2005-vintage spec is not very recent. There were lots of gotchas and incompatibilities. LOTS.
Now, I'm not saying it's impossible, or that there aren't other libraries out there that would do the trick. I have not tried the zopyx.convert mentioned by others here, for example. Maybe it's great. But looking at its dependencies--Java, FOP, etc.--it looks like a pretty complex (and thus likely fragile) toolchain. I read its code on github, and the Python is really only there as a coordination veneer. It organizes external tools XFC, XINC, FOP, and PrinceXML--three of the four of which are commercial software. That includes the key XFC part that deals with RTF. Color me skeptical.
There are two converters that I've found are worth a look: If you're using a Mac, the textutil command line program is actually one of the better and simpler tools I've seen.
textutil -convert html filename.rtf -output filename.html
The other formatting engine that's worth considering is LibreOffice. It's free, open source, reasonably amenable to automation, and a decent foundation as an interoperability hub. That's not just a guess; I've built complex, multi-format document workflows around it.
I would question why you're trying to get into RTF. That seems like a document format you'd be trying to escape from. But if you need to go there, textutil and LibreOffice are the least-worst mechanisms I've found.
There is a wonderful python library that comes as a tarball.
You can download it at https://pypi.python.org/pypi/zopyx.convert2/2.4.5.
Good luck!
I see this question is over a year old, but figured I'd contribute anyway. I recently had a similar requirement, and turned to PyRTF, a small but powerful Python module that can construct RTF documents from a text file. You could use Beautiful Soup to scrape the HTML, going down the parse tree tag by tag, and use the PyRTF API to construct appropriate objects (table, cell, paragraph, section or document).
The API itself is quite granular, and allows for a whole bunch of custom formatting (font text, alignment, color, headers, footers etc.)
Hope this helps.
I use Pisa/xhtml2pdf in my Django apps to generate pdf from an HTML source. That is:
I generate the HTML file formatted with all 'printing' stuffs (e.g. page-breaks, header, footer, etc.)
I convert this HTML into pdf using Pisa
This process is ok but it is slow (expecially when dealing with long tables) and I must use HTML/CSS according to Pisa features/limitations.
The question is: is this the right way to generate pdf from a web application (i.e. create HTML and then convert it to pdf) or there is a more direct way, that is "write" the pdf with a more suitable language?
WeasyPrint author here. The point of using HTML/CSS to generate PDF (vs. using a lower-level PDF library directly.) is to get automatic layout. It lets you specify high-level constraints like h1 { page-break-after: avoid } and let the layout engine figure it out, rather than specifying the absolute position of everything. The former is much more maintainable when you make changes to your documents.
Some tools like rst2pdf have their own stylesheet syntax, but that’s just a bad way of re-inventing CSS.
But yes, dumping complex stylesheets made for screen might not give great results. It’s better to build the stylesheets with print in mind, or even use completely different stylesheets with #media print in CSS or <link media="print"> in HTML.
I think generating a pdf from html with libraries like Pisa or http://weasyprint.org/ is the simplest approach. because it takes care of inserting images, css, barcode (on pisa) ... etc
If you want to write the pdf yourself take a look at Reportlab but it will take much longer to implement. In both cases i suggest to always generate the pdf in the background with celery or python-rq for optimization.
Pisa is known having various issues - especially with long tables. In general one should avoid using PISA. Other options are:
using Reportlab directly
z3c.rml (Reportlab template language clone)
commercial alternatives:
PrinceXML
PDFreactor
The general rule when it comes to PDF production: you get what you pay for.
Converters like Pisa or Apache FOP are half-baked solutions that work for simple cases but suck in general.
You can also use the QT webkit rendering engine to create PDFs from HTML with http://code.google.com/p/wkhtmltopdf/ and django-wkhtmltopdf.
The advantage is that you can write the HTML and CSS as you would normally for WebKit. This works well if you are outputting an existing web page but may be less appropriate if generating PDFs from scratch.
I need to learn/write XML and I already have python downloaded as I am also learning python. I did notice that there was another question on stackoverflow about xml writers and python but I didn't get the idea that there was real consensus on what's easiest to use? That is, I would ideally like an XML editor that highlights my errors and helps with formatting as it can be very tedious. Should I stick with python's element tree xml app or download one of the following that I was told about? Thanks.
http://netbeans.org/downloads/index.html
download Java SE and just use its editor
XML Notepad 2007
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=72d6aa49-787d-4118-ba5f-4f30fe913628&DisplayLang=en#AffinityDownloads
You're asking about two separate things.
XML Notepad and the NetBeans utility are apps for visually creating and editing XML.
Python's ElementTree is a library for programmatically creating and parsing XML.
Which one you need depends on what you want to do - create XML in an editor, or do it inside your program.
I use IntelliJ (community edition should be fine) and emacs for XML editing.
I used the Altova XMLSpy family back in the days I used windows.
I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.
I use cvs to maintain all my python snippets, notes, c, c++ code. As the hosting provider provides a public web- server also, I was thinking that I should convert the cvs automatically to a programming snippets website.
cvsweb is not what I mean.
doxygen is for a complete project and to browse the self-referencing codes online.I think doxygen is more like web based ctags.
I tried with rest2web, it is requires that I write /restweb headers and files to be .txt files and it will interfere with the programming language syntax.
An approach I have thought is:
1) run source-hightlight and create .html pages for all the scripts.
2) now write a script to index those script .htmls and create webpage.
3) Create the website of those pages.
before proceeding, I thought I shall discuss here, if the members have any suggestion.
What do do, when you want to maintain your snippets and notes in cvs and also auto generate it into a good website. I like rest2web for converting notes to html.
Run Trac on the server linked to the (svn) repository. The Trac wiki can conveniently refer to files and changesets. You get TODO tickets, too.
enscript or pygmentize (part of pygments) can be used to convert code to HTML. You can use a custom header or footer to link to the actual code for download.
I finally settled for rest2web. I had to do the following.
Use a separate python script to recursively copy the files in the CVS to a separate directory.
Added extra files index.txt and template.txt to all the directories which I wanted to be in the webpage.
The best thing about rest2web is that it supports python scripting within the template.txt, so I just ran a loop of the contents and indexed them in the page.
There is still lot more to go to automate the entire process. For eg. Inline viewing of programs and colorization, which I think can be done with some more trials.
I have the completed website here, It is called uthcode.