Have somebody experencies with odfpy ?
I parsed document with this python package and I got paragraphs with his text and stylenames, now I need to detect which type of text font is in this paragraphs text?
Do you have any ideas ?
The styles are defined separately from the text. The nodes that contain text will be inside nodes that have a style as an attribute. An example might look like this:
<text:p text:style-name="P5">
<text:span text:style-name="T1">Do donkeys eat macadamia nuts? And if they don't, why don't they?
</text:span>
</text:p>
In this example, two styles (P5 or T1) might specify a font for the text. You will need to look at the document's style definition section.
This code will create a dictionary containing the document's styles.
def get_styles(doc):
styles= {}
for ast in doc.automaticstyles.childNodes:
name= ast.getAttribute('name')
style= {}
styles[name]= style
for k in ast.attributes.keys():
style[k[1]]= ast.attributes[k]
for n in ast.childNodes:
for k in n.attributes.keys():
style[n.qname[1] + "/" + k[1]]= n.attributes[k]
return styles
You can then examine the relevant styles that correspond to the text you care about. Inside each style will be a style:text-properties element, and that element will have a style:font-name attribute that specifies a font.
Related
I am using python docx library to read MS word file(.docx). When i read paragraph i use font function to get all style properties. But sometimes it gives None for font size attribute. Is there any way to get actual font size which paragraph contains.
Example code is given below which i am using to parse paragraphs
from docx import Document
d = Document(document_path)
for paragraph in d.paragraphs:
for run in paragraph.runs:
print (run.font.size)
Short answer is no. What you're asking for is effective font size and python-docx can only see an explicitly set font size. When font.size reports None, it is the default for that paragraph, whatever that is, which depends on the style hierarchy.
In many cases it might be the font size of the applicable paragraph style, but the only way to know for sure is to traverse the style hierarchy for that text node to the first explicit definition.
The following code worked for me:
Divide it by 12700 to get actually font size.
import docx
docFile = docx.Document("C:/Users/vjadhav6/Desktop/testFile.docx")
for i in docFile.paragraphs:
for j in i.runs:
print(j.font.size/12700)
I have been using python docx library and oxml to automate some changes to my tables in my word document. Unfortunately, no matter what I do, I cannot wrap the text in the table cells.
I managed to successfully manipulate 'autofit' and 'fit-text' properties of my table, but non of them contribute to the wrapping of the text in the cells. I can see that there is a "w:noWrap" in the xml version of my word document and no matter what I do I cannot manipulate and remove it. I believe it is responsible for the word wrapping in my table.
for example in this case I am adding a table. I can fit text in cell and set autofit to 'true' but cannot for life of me wrap the text:
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
doc = Document()
table = doc.add_table(5,5)
table.autofit = True # Does Autofit but not wrapping
tc = table.cell(0,0)._tc # As a test, fit text to cell 0,0
tcPr = tc.get_or_add_tcPr()
tcFitText = OxmlElement('w:tcFitText')
tcFitText.set(qn('w:val'),"true")
tcPr.append(tcFitText) #Does fitting but no wrapping
doc.save('demo.docx')
I would appreciate any help or hints.
The <w:noWrap> element appears to be a child of <w:tcPr>, the element that controls table cell properties.
You should be able to access it from the table cell element using XPath:
tc = table.cell(0, 0)._tc
noWraps = tc.xpath(".//w:noWrap")
The noWraps variable here will then be a list containing zero or more <w:noWrap> elements, in your case probably one.
Deleting it is probably the simplest approach, which you can accomplish like this:
if noWraps: # ---skip following code if list is empty---
noWrap = noWraps[0]
noWrap.getparent().remove(noWrap)
You can also take the approach of setting the value of the w:val attribute of the w:noWrap element, but then you have to get into specifying the Clark name of the attribute namespace, which adds some extra fuss and doesn't really produce a different outcome unless for some reason you want to keep that element around.
I am generating a doc using python docx module.
I want to bold the specific cell of a row in python docx
here is the code
book_title = '\nšš¢šš„š-:\n {}\n\n'.format(book_title)
book_desc = 'šš®šš”šØš«-: {}\n\nššš¬šš«š¢š©šš¢šØš§:\n{}\n\nššš„šš¬ ššØš¢š§šš¬:\n{}'.format(book.author,book_description,sales_point)
row1.cells[1].text = (book_title + book_desc)
I just want to bold the book_title.
If I apply a style it automatically applies to whole document.
A cell does not have a character style; character style can only be applied to text, and in particular to a run of text. This is in fact the defining characteristic of a run, being a sequence of characters that share the same character formatting, also known as font in python-docx.
To get the book title with a different font than the description, they need to appear in separate runs. Assigning to Cell.text (as you have) results in all the text being in a single run.
This might work for you, but assumes the cell is empty as you start:
paragraph = row1.cells[1].paragraphs[0]
title_run = paragraph.add_run(book_title)
description_run = paragraph.add_run(book_desc)
title_run.bold = True
This code can be made more compact:
paragraph = row1.cells[1].paragraphs[0]
paragraph.add_run(book_title).bold = True
paragraph.add_run(book_desc)
but perhaps the former version makes it more clear just what you're doing in each step.
Here is how I understand it:
Paragraph is holding the run objects and styles (bold, italic) are methods of run.
So following this logic here is what might solve your question:
row1_cells[0].paragraphs[0].add_run(book_title + book_desc).bold=True
This is just an example for the first cell of the table. Please amend it in your code.
Since you are using the docx module, you can style your text/paragraph by explicitly defining the style.
In order to apply a style, use the following code snippet referenced from docx documentation here.
>>> from docx import Document
>>> document = Document()
>>> style = document.styles['Normal']
>>> font = style.font
>>> font.bold= True
This will change the font style to bold for the applied paragraph.
In python-docx, the styling of any character in a docx template document can be overridden by the use of Rich Text styling. You should provide a context variable for the particular character/string that needs styling in your template, at the position of the character/string. This variable maps to the RichText object that has the style definition(that you define in your code), to style the character/string. To make things clearer, consider an example template doc "test.docx" that contains the following text:
Hello {{r context_var}}!
The {{..}} is the jinja2 tag syntax and {{r is the RichText tag that overrides the character styling. The context_var is a variable that maps the styling to your character string.
We accomplish Rich Text styling like this:
from docxtpl import DocxTemplate, RichText
doc = DocxTemplate("test.docx")
rt = RichText() #create a RichText object
rt.add('World', bold=True) #pass the text as an argument and the style, bold=True
context = { 'context_var': rt } #add context variable to the context and map it to rt
doc.render(context) #render the context
doc.save("generated_doc.docx") #save as a new document
Let's look at the contents of "generated_doc.docx":
Hello World!
I'm not sure how your template is designed, but if you just want the book_title as bold, your template "test.docx" should have text like:
Title:-
{{r book_title_var}}
The code should be modified to:
book_title = "Lord of the Rings" #or wherever you get the book title from
rt.add(book_title, bold=True)
context = { 'book_title_var': rt }
generated_doc.docx:
Title:-
Lord of the Rings
There is a good example for Python Docx.
I have used multiple document.add_heading('xxx', level=Y) and can see when I open the generated document in MS Word that the levels are correct.
What I don't see is numbering, such a 1, 1.1, 1.1.1, etc I just see the heading text.
How can I display heading numbers, using Docx ?
Alphanumeric heading prefixes are automatically created based on the outline style and level of the heading. Set the outline style and insert the correct level and you will get the numbering.
From documentation:
_NumberingStyle objects class docx.styles.style._NumberingStyle[source] A numbering style. Not yet
implemented.
However, if you set the heading like this:
paragraph.style = document.styles['Heading 1']
then it should default to the latent numbering style of that heading.
There is a great work around with python docx for achieving complex operations like headings enumerations. Here how to proceed:
Create a new blank document in Word and define your complex individual multilevel list there .
Save the blank document as my_template.docx.
Create the document in your python script based on your template:
from docx import Document
document = Document("my_template.docx")
document.add_heading('My first numbered heading', level=1)
And, voila, you can generate your perfect document with numbered headings.
this answer will realy help you
first you need to new a without number header like this
paragraph = document.add_paragraph()
paragraph.style = document.styles['Heading 4']
then you will have xml word like this
<w:pPr>
<w:pStyle w:val="4"/>
</w:pPr>
then you can access xml word "pStyle" property and change it using under code
header._p.pPr.pStyle.set(qn('w:val'), u'4FDD')
final, open word file you will get what you want !!!
def __str__(self):
if self.nivel == 1:
return str(Level.count_1)+'.- '+self.titulo
elif self.nivel==2: #Imprime si es del nivel 2
return str(Level.count_1)+'.'+str(Level.count_2)+'.- '+self.titulo
elif self.nivel==3: #Imprime si es del nivel 3
return str(Level.count_1)+'.'+str(Level.count_2)+'.'+str(Level.count_3)+'.- '+self.titulo
So, the the tkinter text editor!
The editor obviously needs to have text styles, which need to change typed text to whatever formatting is currently selected, using tags. But the problem is that the tag name needs to change when the formatting changes, otherwise the tag would be applied to the whole text.
(This was a problem that I was struggling to identify for quite some time)
To avoid this, you would need a LOT of tags (like bold, both, calibri etc), so the code would look like this
if style == 'bold':
tag_add('bold', 'insert -1c', 'insert')
tag_configure('bold', font=('Calibri', 12, 'bold'))
if stlye == 'italic':
etc etc
This is awful code, and makes different fonts/sizes impossible.
Is there a correct way of organising multiple tags like this, something like
tag.add(currentstyle, 'insert -1c', 'insert')
tag.config(currentstyle, font=(currentfont, currentsize, currentweight, currentslant))
Thanks for your help
UPDATE
solved with no small amount of help from Bryan
tagname = '{}-{}-{}-{}'.format(font, fontsize, weight, slant)
textbox.tag_add(tagname, 'insert -1c', 'insert')
textbox.tag_configure(tagname, font=(font, fontsize, weight, slant))
now every tag has a unique name
Yes, you will need to create a unique tag for every different font you use. In practice this isn't so bad, because most documents only use 3-4 variations, or perhaps a worst case of maybe a dozen. The only real difficulty is that if you want both bold and italics you have to create a bold tag, an italics tag, and a bold-italics tag.
This is awful code, and makes different fonts/sizes impossible.
It doesn't make it impossible, just slightly difficult. Your code is actually pretty close to how you would do it.
When a user changes the style of a character, you need to create a canonical form for the style name by combining the current style and any new attributes. For example, if the character is currently bold 12 point and they change it to italic 14 point, the new tag might be "italic-12". If they want to keep the bold it might be "bold-italic-12". You then need to check for whether you have a tag by that name, and create it if you don't, then remove any previous font tag and add the new font tag.
This is really only a problem with fonts. For other attributes such as colors and borders you can simply use all the different tags separately (ie: if you create a tag for "background-blue" and "foreground-red", you can apply both of those tags separately to the text.
I provide an example that does something similar to this here: https://stackoverflow.com/a/3736494/7432