xml writer/editor -- use python's version? - python

I need to learn/write XML and I already have python downloaded as I am also learning python. I did notice that there was another question on stackoverflow about xml writers and python but I didn't get the idea that there was real consensus on what's easiest to use? That is, I would ideally like an XML editor that highlights my errors and helps with formatting as it can be very tedious. Should I stick with python's element tree xml app or download one of the following that I was told about? Thanks.
http://netbeans.org/downloads/index.html
download Java SE and just use its editor
XML Notepad 2007
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=72d6aa49-787d-4118-ba5f-4f30fe913628&DisplayLang=en#AffinityDownloads

You're asking about two separate things.
XML Notepad and the NetBeans utility are apps for visually creating and editing XML.
Python's ElementTree is a library for programmatically creating and parsing XML.
Which one you need depends on what you want to do - create XML in an editor, or do it inside your program.

I use IntelliJ (community edition should be fine) and emacs for XML editing.
I used the Altova XMLSpy family back in the days I used windows.

Related

Is there anyway to read .one(OneNote files) using Python script?

I am trying to read .one file(OneNote files) and want to write its content into a text file, but didn't find a single way to do it using Python. Please help me with this.
Try to get the content of your notes calling:
./me/onenote/pages/1-1c13bcbae2fdd747a95b3e5386caddf1!1-xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/content?includeIDs=true&includeInkML=true&preAuthenticated=true
It will give you text/html, which you can parse with https://pypi.org/project/lxml/
I didn't find a good way to decode .one file.
But I find another way to workaround it.
Install OneNote 2016 and sync the contents.
Install the Evernote legacy version
Import data from OneNote to Evernote. (There's a button on GUI).
Export notes to html from Evernote.
Then you can do whatever you want. Yeah!
This isn't Python, but for other internet voyagers trying to escape Microsoft's iron grip this PowerShell script is pure magic.
https://passbe.com/2019/bulk-export-onenote-2013-2016-pages-as-html/

HTML to RTF string using Python

I am looking for a way to convert HTML text to RTF string. Is there any libraries that does this job. I get html content dynamically in my project and need it to be rendered in RTF format. I am using HTML parser to convert HTML text to normal string and then have trying to use PyRTF for conversion to RTF format. Is there any better way that this can be done.Thanks in advance.
RTF seems a dicey format to convert from/to. I've tried cutting and pasting among applications on Mac OS X, for example, where RTF is something of a lingua franca. Some of those apps are Microsoft apps (relevant in that RTF is a Microsoft-developed format), others are not. Even basic formatting information like font size, font face, line spacing, and list styling (ordered or unordered) is jumbled when copying from one ostensibly RTF-speaking app to another. Simply put, it's a mess.
I have searched for ways to programmatically read, write, and transform RTF, preferably from Python. I found a number of packages on PyPI, trying them out has been a disappointing experience. They would support RTF 1.5, say, when the current version is 1.9.1. RTF has been around a long time, but a 2005-vintage spec is not very recent. There were lots of gotchas and incompatibilities. LOTS.
Now, I'm not saying it's impossible, or that there aren't other libraries out there that would do the trick. I have not tried the zopyx.convert mentioned by others here, for example. Maybe it's great. But looking at its dependencies--Java, FOP, etc.--it looks like a pretty complex (and thus likely fragile) toolchain. I read its code on github, and the Python is really only there as a coordination veneer. It organizes external tools XFC, XINC, FOP, and PrinceXML--three of the four of which are commercial software. That includes the key XFC part that deals with RTF. Color me skeptical.
There are two converters that I've found are worth a look: If you're using a Mac, the textutil command line program is actually one of the better and simpler tools I've seen.
textutil -convert html filename.rtf -output filename.html
The other formatting engine that's worth considering is LibreOffice. It's free, open source, reasonably amenable to automation, and a decent foundation as an interoperability hub. That's not just a guess; I've built complex, multi-format document workflows around it.
I would question why you're trying to get into RTF. That seems like a document format you'd be trying to escape from. But if you need to go there, textutil and LibreOffice are the least-worst mechanisms I've found.
There is a wonderful python library that comes as a tarball.
You can download it at https://pypi.python.org/pypi/zopyx.convert2/2.4.5.
Good luck!
I see this question is over a year old, but figured I'd contribute anyway. I recently had a similar requirement, and turned to PyRTF, a small but powerful Python module that can construct RTF documents from a text file. You could use Beautiful Soup to scrape the HTML, going down the parse tree tag by tag, and use the PyRTF API to construct appropriate objects (table, cell, paragraph, section or document).
The API itself is quite granular, and allows for a whole bunch of custom formatting (font text, alignment, color, headers, footers etc.)
Hope this helps.

create office files from python

We have a project in python with django.
We need to generate complex word, excel and pdf files.
For the rest of our projects which were done in PHP we used PHPexcel ,
PHPWord and tcpdf for PDF.
What libraries for python would you recommend for creating this kind of files ? (for excel and word its imortant to use the open xml file format xlsx , docx)
Python-docx may help ( https://github.com/mikemaccana/python-docx ).
Python doesn't have highly-developed tools to manipulate word documents. I've found the java library xdocreport ( https://code.google.com/p/xdocreport/ ) to be the best by far for Word reporting. Because I need to generate PCL, which is efficiently done via FOP I also use docx4j.
To integrate this with my python, I use the spark framework to wrap it up with a simple web service, and use requests on the python side to talk to the service.
For excel, there's openpyxl, which actually is a python port of PHPexcel, afaik. I haven't used it yet, but it sounds ok to me.
I would recommend using Docutils. It takes reStructuredText files and converts them to a range of output files. Included in the package are HTML, LaTeX and .odf file writers but in the sandbox there are a whole load of other writers for writing to other formats, see for example, the WordML writer (disclaimer: I haven't used it).
The advantage of this solution is that you can write plain text (reStructuredText) master files, which are human readable as is, and then convert to a range of other file formats as required.
Whilst not a Python solution, you should also look at Pandoc a Haskell library which supports a much wider range of output and input formats than docutils. One major advantage of Pandoc over Docutils is that you can do the reverse translation, i.e. WordML to reStructuredText. You can try Pandoc here.
I have never used any libraries for this, but you can change the extension of any docx, xlsx file to zip, and see the magic!
Generating openxml files is as simple as generating couple of XML files (you can use templates) and zipping it.
Simplest way to generate PDF is to generate HTML (with CSS+images) and convert it using wkhtmltopdf tool.

How to create a GUI from DTD?

I have xml configs which are very complex. They are validated using dtds. I am looking for some application which reads the dtds and provides a GUI interface to write xmls. So that we don't have to write xmls by hand. If there is nothing existing already, how to start with developing one?
did you try Eclipse?
You can edit XML files according to the DTD.
If your audience is not development oriented, you can have a look to Editix XML Editor (the free version is already fully usable) or Notepad++ (a strange mix between the classic windows notepad and eclipse ;) ).
Otherwise, if you want to write come code, have a look to wxPython there are some widgets to edit/view HTML and XML.
Take a look at oXygen XML Author. It's not intended for programmers/developers like oXygen XML Editor is. It's a WYSIWYG XML authoring tool. You can create CSS stylesheets to display the XML in a way that is easy for the authors to create/modify the XML.
It's not free and you'll have to create CSS for display, but it is much better than trying to develop something from scratch. It also supports creating templates so that authors can start with a base XML file and then modify/update.

Is there a Python module for converting RTF to plain text? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Ideally, I'd like a module or library that doesn't require superuser access to install; I have limited privileges in my working environment.
I've been working on a library called Pyth, which can do this:
http://pypi.python.org/pypi/pyth/
Converting an RTF file to plaintext looks something like this:
from pyth.plugins.rtf15.reader import Rtf15Reader
from pyth.plugins.plaintext.writer import PlaintextWriter
doc = Rtf15Reader.read(open('sample.rtf'))
print PlaintextWriter.write(doc).getvalue()
Pyth can also generate RTF files, read and write XHTML, generate documents from Python markup a la Nevow's stan, and has limited experimental support for latex and pdf output. Its RTF support is pretty robust -- we use it in production to read RTF files generated by various versions of Word, OpenOffice, Mac TextEdit, EIOffice, and others.
OpenOffice has a RTF reader. You can use python to script OpenOffice, see here for more info.
You could probably try using the magic com-object on Windows to read anything that smells ms-binary. I wouldn't recommend that though.
Actually parsing the raw data probably won't be very hard, see this example written in .bat/QBasic.
DocFrac is a free open source converter betweeen RTF, HTML and text. Windows, Linux, ActiveX and DLL platforms available. It will probably be pretty easy to wrap it up in python.
RTF::TEXT::Converter - Perl extension for converting RTF into text. (in case You have problems withg DocFrac).
Official Rich Text Format (RTF) Specifications, version 1.7, by Microsoft.
Good luck (with the limited privileges in Your working environment).
If you are on Mac , you can convert an RTF file file.rtf to TXT from the CLI like:
textutil -convert txt file.rtf
Have you checked out pyrtf-ng?
Update: The parsing functionality is available if you do a Subversion checkout, but I'm not sure how full-featured it is. (Look in the rtfng.parser.base module.)
Here's a link to a script that converts rtf to text using regex:
Regular Expression for extracting text from an RTF string
Also, and updated link on github:
Github link
There is good library pyrtf-ng for all-purpose RTF handling.
PyRTF-ng 0.9.1 has not parsed any of my RTF documents, both with the ParsingException.
First document was generated with OpenOffice 3.4, the second one with Mac TextEdit.
Pyth 0.5.6 parsed without problems both documents, but has not processed cyrillic symbols properly.
But each editor opens other's editor document correctly and without trouble, so all libraries seems to have a weak rtf support.
So I'm writing my own parser with with blackjack and hookers.
(I've uploaded both files, so you can check RTF libraries by yourself: http://yadi.sk/d/RMHawVdSD8O9 http://yadi.sk/d/RmUaSe5tD8OD)
I just came across pyrtflib - there's not much (any) documentation on it, it's kinda a case of installing it and then using the inbuilt help() function to find out what's available and what everything does.
Having said that in my little trial run of its rtf.Rtf2Html.getHtml() function it went well enough. I haven't tried the Rtf2Txt function but given the simpler nature of converting rtf to plaintext it should do fine I'd expect.
I ran into the same thing ans I was trying to code it myself. It's not that easy but here is what I had when I decided to go for a commandline app. Its ruby but you can adapt to python very easily.
There is some header garbage to clean up, but you can see more or less the idea.
f = File.open('r.rtf','r')
b=0
p=false
str = ''
begin
while (char = f.readchar)
if char.chr=='{'
b+=1
next
end
if char.chr=='}'
b-=1
next
end
if char.chr=='\\'
p=true
next
end
if p==true && (char.chr==' ' or char.chr=='\n' or char.chr=='\t' or char.chr=='\r')
p=false
next
end
if p==true && (char.chr=='\'')
#this is the source of my headaches. you need to read the code page from the header and encode this.
p=false
str << '#'
next
end
next if b>2
next if p
str << char.chr
end
rescue EOFError
end
f.close
Conversely, if you want to write RTFs easily from Python, you can use the third-party module rtflib. It's a fairly new and incomplete module but still very powerful and useful. Below is an example that writes "hello world" in rich text to an RTF called helloworld.rtf. This is a very primitive example, and the module can also be used to add colors, italics, tables, and many other aspects of rich text to RTF files.
from rtflib import *
file = RTF("helloworld.rtf")
file.startfile()
file.addstrict()
file.addtext("hello world")
file.writeout()

Categories