I can´t insert tab stops into a docx generated from html

I can´t insert tab stops into a docx generated from html - python

I have a very specific use case where I need to insert tab stops into Word documents. My code works perfectly when using a docx that was created normally. However, the other part of my use case is that I extract the html from a text editor and turn it into a docx. The problem is with these documents that were generated from html, for some reason when running the same code to insert tab stops it does not work. The tab stop configuration gets created but it is not applied to the document. I cannot seem to find a way around it and any help would be deeply appreciated.
Below is a code sample:
from docx import Document
from docx.shared import Inches
from docx.enum.text import WD_TAB_ALIGNMENT, WD_TAB_LEADER
from htmldocx import HtmlToDocx
new_parser = HtmlToDocx()
new_html = """<p><span>some text</span></p>
<br>
<p><span>Some persons name</span></p>
<p><span>Another text</span></p>
<p><span>Some date</span></p>"""
document = Document()
new_parser.add_html_to_document(new_html, document)
for para in document.paragraphs:
tab_stops = para.paragraph_format.tab_stops
tab_stops.add_tab_stop(Inches(5.51),
WD_TAB_ALIGNMENT.RIGHT, WD_TAB_LEADER.DOTS)
document.save('new-file-name.docx')
When running this code the tab stops configuration gets created correctly in the docx, but it is not reflected in the document itself. Below you can see the configuration correctly created:
However, those tab stops are not visible in the document itself.
This function is supposed to run on Azure functions, so pywin32 is not an option to convert html to docx as it does not run on linux.
I have tried manually setting the styles of the document. I have tried using the api of convertapi, as well as using the library aspose.words but nothing seems to work. It seems that there is something about converting html to docx that precludes inserting tab stops.
Thank you very much in advance and any help is deeply appreciated.

Related

python - convert docx to HTML including Fonts and Fonts Size

I'm trying to convert a file from Docx to HTML with font family, fonts size and colors in Python, I tried couple of solutions i.e Python docx, docx2html, Python Mammoth.
but none of the packages works for me. these packages are converting to HTML, but many things related to styles i.e fonts, size, and colors are skipped.
I tried to open and read docx files using Python zipfile and get XML of word file, I got all the docx information in XML, so now I'm thinking of parsing XML to HTML in Python, Maybe I can find any parser for this purpose.
Here's the snippet of code that I tried with Python docx but I'm getting None values here.
d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
print(f'{key} : {d_styles[key]}')
for XML using zipfile here's my code snippet.
docx = zipfile.ZipFile(path)
content = docx.read('word/document.xml').decode('utf-8')
Any help will be highly appreciated.

How to update Page header using python odfdo module?

I am a complete beginner at python language. For a project I am writing a python script to update a template Open Document File using odfdo module. I am having a hard time with understanding the concept of updating page header. I have looked into Odfdo documentation and found 'get_page_headers' and 'set_page_headers' functions, but have not succeed with its usage.Could someone help me with it?
Thanks

This works for Libreoffice 6.4:
Get the master-page style. With that style loaded, you can just modify the page header.
from odfdo import Document, Style
doc = Document(testdoc)
# its master-page style has the page-header & footer (returns one element list)
mpstyle = doc.get_styles('master-page')[0]
# get the page_header style, you can take a look at the content
print(mpstyle.get_page_header().serialize())
# Now change the page header
mpstyle.set_page_header('New text')
# save your odt file
doc.save(moddoc, pretty=True)
Regards,
Robert

python 3.6 windows: retrieving the clipboard CF_HTML format

I want to copy some rich text, modify its source code (changing some tags and text, using regex and/or beautifulsoup) and send it back to the clipboard. I'm looking for the easiest way to do that.
I tried win32clipboard, but it doesn't support the CF_HTML format (windows clipboard contains many formats).
So I'm looking for a module that could help me to get this format:
if the CF_HTML clipboard format contains HTML, store it in that variable, do some operation, then send it back. (Optionally: and do other stuff on other clipboard formats)
Here is a Linux equivalent of what I'm looking for. It retrieves the HTML source, when there's some in the clipboard (source)
#!/usr/bin/env python
import gtk
print (gtk.Clipboard().wait_for_contents('text/html')).data
Edit1: There is a work around with pywin32 using this script. But is there a module able to do that directly (if CF_HTML contains data, get it, and send it back)?

The Edit1 solution seems to be actually the best.
put the script above (HtmlClipboard.py) in the python module folder: C:\Users\xxx\AppData\Local\Programs\Python\Python36\Lib\site-packages
install win32clipboard
With the 2 points above you could play with a script like this:
#get CF_Html Clipboard
import HtmlClipboard #.py script found in github
if HtmlClipboard.HasHtml():
# print('there is HTML!!')
dirty_HTML = HtmlClipboard.GetHtml()
print(dirty_HTML)
else:
print('no html')
dirty_HTML= clean_HTML #do what you want with it
#put data to clipboard:
HtmlClipboard.PutHtml(clean_HTML)
Bonus:
##get CF_TEXT from clipboard
import win32clipboard
win32clipboard.OpenClipboard()
text = win32clipboard.GetClipboardData(win32clipboard.CF_TEXT)
win32clipboard.CloseClipboard()

how write hyperlink to local picture into the cell in openpyxl?

I use Python 2.7.3
I need to write hyperlink to local picture into the cell by openpyxl library.
when I need add hyperlink to web site I write something like this:
from openpyxl import Workbook
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.worksheets[0]
ws.title = 'Name'
hyperlink to local picture
ws.cell('B1').hyperlink = ('http://pythonhosted.org/openpyxl/api.html')
hyperlink to local picture
ws.cell('B2').hyperlink = ('1.png') # It doesn't work!
wb.save(filename = dest_filename)
I have 3 question:
how we can write hyperlink like VBA's style function:
ActiveCell.FormulaR1C1 = _
"=HYPERLINK(""http://stackoverflow.com/questions/ask"",""site"")"
with hyherlink and her name
how we can write hyperlink to local image?
ws.cell('B2').hyperlink = ('1.png') # It doesn't work! And I don't now what to do )
Plese, help me )
Can we use unicode hyperlinks to image? for example when I use
ws.cell('B1').hyperlink =
(u'http://pythonhosted.org/openpyxl/api.html') It fail with error!
for example we have picture 'russian_language_name.png' and we
create hyperlink in exel without any problem. We click to the cell,
and then print
'=Hyperlink("http://stackoverflow.com/questions/ask";"site_by_russian_language")
save document, unzip him. Then we go to him directory to xl->worksheets->sheet1.xml
and we see the title
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
and then ...
row r="2" x14ac:dyDescent="0.25" spans="2:6">-<c r="B2" t="str" s="1"><f>HYPERLINK("http://stackoverflow.com/questions/ask","site_by_russian_language")</f><v>site_by_russian_language</v></c>
everything ok =) Exel supports unicode, but what about python's library openpyxl? It support the unicode in hyperlinks ?

As the files in the .xlsx file are XML files with UTF-8 encoding, Unicode hyperlinks are not a problem.

About Question 2, you need to include the full path of the file link, i think.
If you cannot access the file link in your Excel file, it's the security strategy of Excel that prohibits such actions.

I answered a similar question. Hope this helps.
Well, I could arrive at this. While there is no direct way to build a hyperlink, in your case we could do this way. I was able to build a hyperlink to an existing file using the below code.
wb=openpyxl.Workbook()
s = wb.get_sheet_by_name('Sheet')
s['B4'].value = '=HYPERLINK("C:\\Users\\Manoj.Waghmare\\Desktop\\script.txt", "newfile")'
s['B4'].style = 'Hyperlink'
wb.save('trial.xlsx')
By mentioning the style attribute as 'Hyperlink' is the key. All other code I have may not be of any much importance to you. style attribute would otherwise have a value of 'Normal' Strange thing is even without the style attribute, the hyperlink we working but just that it was lacking style! of course. Though strange, I have seen stranger things. Hope this helps.

Is there a simple way to write an ODT using Python?

My point is that using either pod (from appy framework, which is a pain to use for me) or the OpenOffice UNO bridge that seems soon to be deprecated, and that requires OOo.org to run while launching my script is not satisfactory at all.
Can anyone point me to a neat way to produce a simple yet clean ODT (tables are my priority) without having to code it myself all over again ?
edit: I'm giving a try to ODFpy that seems to do what I need, more on that later.

Your mileage with odfpy may vary. I didn't like it - I ended up using a template ODT, created in OpenOffice, oppening the contents.xml with ziplib and elementtree, and updating that. (In your case, it would create only the relevant table rows and table cell nodes), then recorded everything back.
It is actually straightforward, but for making ElementTree properly work with the XML namespaces. (it is badly documente) But it can be done. I don't have the example, sorry.

To edit odt files, my answer may not help, but if you want to create new odt files, you can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
QTextCursor also can insert tables, frames, blocks, images. More information. More information at:
http://qt-project.org/doc/qt-4.8/qtextcursor.html
As a bonus, you also can print to a pdf file by using QPrinter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

I can´t insert tab stops into a docx generated from html - python

Related

python - convert docx to HTML including Fonts and Fonts Size

How to update Page header using python odfdo module?

python 3.6 windows: retrieving the clipboard CF_HTML format

how write hyperlink to local picture into the cell in openpyxl?

Is there a simple way to write an ODT using Python?

Categories

Resources