Add Section to OpenDocument Text file with ODFpy - python

I am using Python2.7 and ODFpy to write an OpenDocument Text (ODT) file. Is there a way using the existing ODFpy API to add sections (a la Format->Sections...) to the document? Is there a way to import them from another document and then populate them, or to otherwise fetch the styling from another document?

A section can be added to a document by a method something like this:
from odf import text as odftext
from odf import opendocument
document = opendocument.OpenDocument()
document.text.addElement(odftext.Section(name="section1"))

Related

I canĀ“t insert tab stops into a docx generated from html

I have a very specific use case where I need to insert tab stops into Word documents. My code works perfectly when using a docx that was created normally. However, the other part of my use case is that I extract the html from a text editor and turn it into a docx. The problem is with these documents that were generated from html, for some reason when running the same code to insert tab stops it does not work. The tab stop configuration gets created but it is not applied to the document. I cannot seem to find a way around it and any help would be deeply appreciated.
Below is a code sample:
from docx import Document
from docx.shared import Inches
from docx.enum.text import WD_TAB_ALIGNMENT, WD_TAB_LEADER
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
new_html = """<p><span>some text</span></p>
<br>
<p><span>Some persons name</span></p>
<p><span>Another text</span></p>
<p><span>Some date</span></p>"""
document = Document()
new_parser.add_html_to_document(new_html, document)
for para in document.paragraphs:
tab_stops = para.paragraph_format.tab_stops
tab_stops.add_tab_stop(Inches(5.51),
WD_TAB_ALIGNMENT.RIGHT, WD_TAB_LEADER.DOTS)
document.save('new-file-name.docx')
When running this code the tab stops configuration gets created correctly in the docx, but it is not reflected in the document itself. Below you can see the configuration correctly created:
However, those tab stops are not visible in the document itself.
This function is supposed to run on Azure functions, so pywin32 is not an option to convert html to docx as it does not run on linux.
I have tried manually setting the styles of the document. I have tried using the api of convertapi, as well as using the library aspose.words but nothing seems to work. It seems that there is something about converting html to docx that precludes inserting tab stops.
Thank you very much in advance and any help is deeply appreciated.

What's the best way to convert a .doc or .docx document to .odt using Python?

I've tried the following. It involves the use of the convertapi library.
import convertapi
import os
import tempfile
convertapi.api_secret = 'my_secret'
print('Converting from .doc(x) to .odt')
odt_result = convertapi.convert('odt', { 'File': '/mnt/c/Users/username/Documents/TGMC.docx' })
odt_result.file.save('/mnt/c/Users/fvsha/Documents/TGMC2.odt') ~
While it worked, I noticed a change in the automatic chapter numbering. The original document had chapters 1 through 5. The new odt document had chapters 1.1-1.5 followed by chapter 2.
Two questions: What caused this and could you recommend a way to convert the file from .doc/.docx to .odt using Python without having to manually clean up everything afterwards? Thank you.

Converting docx table into html (keeping all formatting) or an image to use in html

I've used python-docx to create some tables using a specified style format in my docx file. I now need to use these tables with this same formatting. Is there a way I can either convert the table including all of the formatting and styles, colours etc. to html? Or failing that a simple (automated) way of making the table into a figure which could be used?
To covert Docx to HTML use below code:
Below code do not identify the tables and images from docx.It convert docx to html but not preserve tables and images.
import mammoth
Docx = open("docx_file.docx", 'rb')
html = open('html_filename.html', 'wb')
document = mammoth.convert_to_html(Docx )
html.write(document.value.encode('utf8'))
Docx.close()
html.close()
To keep formatting and images use win32 package for converting docx to html.
import win32com.client
doc = win32com.client.GetObject ("docx_InputFile.docx")
doc.SaveAs (FileName="Html_FileName.html", FileFormat=8)
doc.Close ()
I can't find suitable solution, that supports conversion with formatting and styles. But you may try to convert docx to jpg by using this: DOCX to JPG API. Python library and snippets for this service are here: ConvertAPI/convertapi-python

How to access data from pdf forms with python?

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.
There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

Django/Python: Save an HTML table to Excel

I have an HTML table that I'd like to be able to export to an Excel file. I already have an option to export the table into an IQY file, but I'd prefer something that didn't allow the user to refresh the data via Excel. I just want a feature that takes a snapshot of the table at the time the user clicks the link/button.
I'd prefer it if the feature was a link/button on the HTML page that allows the user to save the query results displayed in the table. It would also be nice if the formatting from the HTML/CSS could be retained. Is there a way to do this at all? Or, something I can modify with the IQY?
I can try to provide more details if needed. Thanks in advance.
You can use the excellent xlwt module.
It is very easy to use, and creates files in xls format (Excel 2003).
Here is an (untested!) example of use for a Django view:
from django.http import HttpResponse
import xlwt
def excel_view(request):
normal_style = xlwt.easyxf("""
font:
name Verdana
""")
response = HttpResponse(mimetype='application/ms-excel')
wb = xlwt.Workbook()
ws0 = wb.add_sheet('Worksheet')
ws0.write(0, 0, "something", normal_style)
wb.save(response)
return response
Use CSV. There's a module in Python ("csv") to generate it, and excel can read it natively.
Excel support opening an HTML file containing a table as a spreadsheet (even with CSS formatting).
You basically have to serve that HTML content from a django view, with the content-type application/ms-excel as Roberto said.
Or if you feel adventurous, you could use something like Downloadify to prepare the file to be downloaded on the client side.

Categories