accessing bookmarks using python-docx

accessing bookmarks using python-docx - python

I am using the python-docx module to read and edit a .docm file,
The file contains bookmarks, how do I access all the bookmarks already stored using that module, there doesnt seem to be any methods within the doc object.

As commented by #D Malan, now in 2021-november it is still an open issue in the python-docx.
Meanwhile we can live with our own implementation.
Please create a file named docxbookmark.py in a folder accessible as an import:
from docx.document import Document as _innerdoclass
from docx import Document as _innerdocfn
from docx.oxml.shared import qn
from lxml.etree import Element as El
class Document(_innerdoclass):
def _bookmark_elements(self, recursive=True):
if recursive:
startag = qn('w:start')
bkms = []
def _bookmark_elements_recursive(parent):
if parent.tag == startag:
bkms.append(parent)
for el in parent:
_bookmark_elements_recursive(el)
_bookmark_elements_recursive(self._element)
return bkms
else:
return self._element.xpath('//'+qn('w:bookmarkStart'))
def bookmark_names(self):
"""
Gets a list of bookmarks
"""
return [v for bkmkels in self._bookmark_elements() for k,v in bkmkels.items() if k.endswith('}name')]
def add_bookmark(self, bookmarkname):
"""
Adds a bookmark with bookmark with name bookmarkname to the end of the file
"""
el = [el for el in self._element[0] if el.tag.endswith('}p')][-1]
el.append(El(qn('w:bookmarkStart'),{qn('w:id'):'0',qn('w:name'):bookmarkname}))
el.append(El(qn('w:bookmarkEnd'),{qn('w:id'):'0'}))
def __init__(self, innerDocInstance = None):
super().__init__(Document, None)
if innerDocInstance is not None and type(innerDocInstance) is _innerdoclass:
self.__body = innerDocInstance.__body
self._element = innerDocInstance._element
self._part = innerDocInstance._part
def DocumentCreate(docx=None):
"""
Return a |Document| object loaded from *docx*, where *docx* can be
either a path to a ``.docx`` file (a string) or a file-like object. If
*docx* is missing or ``None``, the built-in default document "template"
is loaded.
"""
return Document(_innerdocfn(docx))
Now we can use our facade implementation just like the old one, along with those new add_bookmark and bookmark_names.
To add a bookmark in a new file, import our implementation and use add_bookmark on the document object:
from docxbookmark import DocumentCreate as Document
doc = Document()
document.add_paragraph('First Paragraph')
document.add_bookmark('FirstBookmark')
document.add_paragraph('Second Paragraph')
document.save('docwithbookmarks.docx')
To see bookmarks in a document, import our implementation and use bookmark_names on the document object:
from docxbookmark import DocumentCreate as Document
doc = Document('docwithbookmarks.docx')
doc.bookmark_names()
The returned list is simplier than other objects, it shows only strings not objects. There is an internal _bookmark_elements which will return lxml nodes which are not the same as python-docx objects.
Just a few tests were made, probably not working in many cases. Please tell in the comments if it didn't work.

Related

How to create Sphinx role which automatically creates a label for references?

In my documentation, I often refer to XML tags. So I try to write a small extension to simplify adding and referencing tags to the documentation. The documentation shall look like this:
My text, referencing to :xmltag-ref:`ExampleTag` here.
The :xmltag:`ExampleTag` Explained in Detail
--------------------------------------------
This section describes the tag...
I wrote a simple extension like this:
import re
from docutils import nodes
def xmltag(name, rawtext, text, lineno, inliner, options={}, content=[]):
literal_node = nodes.literal(text=text, classes=['xmltag'])
# <--- HERE: How to add target?
result_nodes = [literal_node]
return result_nodes, []
def xmltag_ref(name, rawtext, text, lineno, inliner, options={}, content=[]):
source = f'tag-{text}'
reference_node = nodes.reference(classes=['xmltag-ref'], text=text)
reference_node['refid'] = source
result_nodes = [reference_node]
return result_nodes, []
def setup(app):
app.add_role("xmltag-ref", xmltag_ref)
app.add_role("xmltag", xmltag)
return {
'version': '1.0',
'parallel_read_safe': True,
'parallel_write_safe': True,
}
If I add a reference manually, everything works as expected, but I could not find a way how to add a label automatically if the Tag is defined in a title.
I like how the automatic reference works, e.g. using the option directive:
.. option:: --example
text...
This automatically creates a label, which can be referenced using :option:'--example'. I would like to have something similar, but with the flexibility to use it in a title, which also creates an entry in the TOC.
How can I add a label in a role element?
Or, is there another way, how this should be solved correctly?

Can I call sphinx.parsers.Parser() directly and parse a fragment of reST?

I need to parse a stand-alone fragment of reST content to a doctree (for later processing). I can do it via docutils easily enough, e.g.:
# ref: http://stackoverflow.com/questions/12883428/
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
def parse_rst(text: str) -> docutils.nodes.document:
parser = docutils.parsers.rst.Parser()
components = (docutils.parsers.rst.Parser,)
settings = docutils.frontend.OptionParser(
components=components).get_default_values()
document = docutils.utils.new_document('<rst-doc>', settings=settings)
parser.parse(text, document)
return document
class MyVisitor(docutils.nodes.NodeVisitor):
def visit_reference(self, node: docutils.nodes.reference) -> None:
"""Called for "reference" nodes."""
print(node)
def unknown_visit(self, node: docutils.nodes.Node) -> None:
"""Called for all other node types."""
print(node)
if __name__ == '__main__':
doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)
That works unless the reST content includes Sphinx-specific directives/roles (e.g. .. glossary::, :term:, etc).
So I need the Sphinx parser rather than the docutils one.
I tried subclassing sphinx.parsers.Parser; using from sphinx.application import Sphinx then app.build() to fake a sphinx docs project.
But references and workarounds get complicated so I suspect I'm approaching it the wrong way. Can I use the sphinx parser outside of the full-blown sphinx-build workflow?

Creating Custom Tag in PyYAML

I'm trying to use Python's PyYAML to create a custom tag that will allow me to retrieve environment variables with my YAML.
import os
import yaml
class EnvTag(yaml.YAMLObject):
yaml_tag = u'!Env'
def __init__(self, env_var):
self.env_var = env_var
def __repr__(self):
return os.environ.get(self.env_var)
settings_file = open('conf/defaults.yaml', 'r')
settings = yaml.load(settings_file)
And inside of defaults.yaml is simply:
example: !ENV foo
The error I keep getting:
yaml.constructor.ConstructorError:
could not determine a constructor for the tag '!ENV' in
"defaults.yaml", line 1, column 10
I plan to have more than one custom tag as well (assuming I can get this one working)

Your PyYAML class had a few problems:
yaml_tag is case sensitive, so !Env and !ENV are different tags.
So, as per the documentation, yaml.YAMLObject uses meta-classes to define itself, and has default to_yaml and from_yaml functions for those cases. By default, however, those functions require that your argument to your custom tag (in this case !ENV) be a mapping. So, to work with the default functions, your defaults.yaml file must look like this (just for example) instead:
example: !ENV {env_var: "PWD", test: "test"}
Your code will then work unchanged, in my case print(settings) now results in {'example': /home/Fred} But you're using load instead of safe_load -- in their answer below, Anthon pointed out that this is dangerous because the parsed YAML can overwrite/read data anywhere on the disk.
You can still easily use your YAML file format, example: !ENV foo—you just have to define an appropriate to_yaml and from_yaml in class EnvTag, ones that can parse and emit scalar variables like the string "foo".
So:
import os
import yaml
class EnvTag(yaml.YAMLObject):
yaml_tag = u'!ENV'
def __init__(self, env_var):
self.env_var = env_var
def __repr__(self):
v = os.environ.get(self.env_var) or ''
return 'EnvTag({}, contains={})'.format(self.env_var, v)
#classmethod
def from_yaml(cls, loader, node):
return EnvTag(node.value)
#classmethod
def to_yaml(cls, dumper, data):
return dumper.represent_scalar(cls.yaml_tag, data.env_var)
# Required for safe_load
yaml.SafeLoader.add_constructor('!ENV', EnvTag.from_yaml)
# Required for safe_dump
yaml.SafeDumper.add_multi_representer(EnvTag, EnvTag.to_yaml)
settings_file = open('defaults.yaml', 'r')
settings = yaml.safe_load(settings_file)
print(settings)
s = yaml.safe_dump(settings)
print(s)
When this program is run, it outputs:
{'example': EnvTag(foo, contains=)}
{example: !ENV 'foo'}
This code has the benefit of (1) using the original pyyaml, so nothing extra to install and (2) adding a representer. :)

I'd like to share how I resolved this as an addendum to the great answers above provided by Anthon and Fredrick Brennan. Thank you for your help.
In my opinion, the PyYAML document isn't real clear as to when you might want to add a constructor via a class (or "metaclass magic" as described in the doc), which may involve re-defining from_yaml and to_yaml, or simply adding a constructor using yaml.add_constructor.
In fact, the doc states:
You may define your own application-specific tags. The easiest way to do it is to define a subclass of yaml.YAMLObject
I would argue the opposite is true for simpler use-cases. Here's how I managed to implement my custom tag.
config/__init__.py
import yaml
import os
environment = os.environ.get('PYTHON_ENV', 'development')
def __env_constructor(loader, node):
value = loader.construct_scalar(node)
return os.environ.get(value)
yaml.add_constructor(u'!ENV', __env_constructor)
# Load and Parse Config
__defaults = open('config/defaults.yaml', 'r').read()
__env_config = open('config/%s.yaml' % environment, 'r').read()
__yaml_contents = ''.join([__defaults, __env_config])
__parsed_yaml = yaml.safe_load(__yaml_contents)
settings = __parsed_yaml[environment]
With this, I can now have a seperate yaml for each environment using an env PTYHON_ENV (default.yaml, development.yaml, test.yaml, production.yaml). And each can now reference ENV variables.
Example default.yaml:
defaults: &default
app:
host: '0.0.0.0'
port: 500
Example production.yaml:
production:
<<: *defaults
app:
host: !ENV APP_HOST
port: !ENV APP_PORT
To use:
from config import settings
"""
If PYTHON_ENV == 'production', prints value of APP_PORT
If PYTHON_ENV != 'production', prints default 5000
"""
print(settings['app']['port'])

If your goal is to find and replace environment variables (as strings) defined in your yaml file, you can use the following approach:
example.yaml:
foo: !ENV "Some string with ${VAR1} and ${VAR2}"
example.py:
import yaml
# Define the function that replaces your env vars
def env_var_replacement(loader, node):
replacements = {
'${VAR1}': 'foo',
'${VAR2}': 'bar',
}
s = node.value
for k, v in replacements.items():
s = s.replace(k, v)
return s
# Define a loader class that will contain your custom logic
class EnvLoader(yaml.SafeLoader):
pass
# Add the tag to your loader
EnvLoader.add_constructor('!ENV', env_var_replacement)
# Now, use your custom loader to load the file:
with open('example.yaml') as yaml_file:
loaded_dict = yaml.load(yaml_file, Loader=EnvLoader)
# Prints: "Some string with foo and bar"
print(loaded_dict['foo'])
It's worth noting, you don't necessarily need to create a custom EnvLoader class. You can call add_constructor directly on the SafeLoader class or the yaml module itself. However, this can have an unintended side-effect of adding your loader globally to all other modules that rely on those loaders, which could potentially cuase problems if those other modules have their own custom logic for loading that !ENV tag.

There are several problems with your code:
!Env in your YAML file is not the same as !ENV in your code.
You are missing the classmethod from_yaml that has to be provided for EnvTag.
Your YAML document specifies a scalar for !Env, but the subclassing mechanism for yaml.YAMLObject calls construct_yaml_object which in turn calls construct_mapping so a scalar is not allowed.
You are using .load(). This is unsafe, unless you have complete control over the YAML input, now and in the future. Unsafe in the sense that uncontrolled YAML can e.g. wipe or upload any information from your disc. PyYAML doesn't warn you for that possible loss.
PyYAML only supports most of YAML 1.1, the latest YAML specification is 1.2 (from 2009).
You should consistently indent your code at 4 spaces at every level (or 3 spaces, but not 4 at the first and 3 a the next level).
your __repr__ doesn't return a string if the environment variable is not set, which will throw an error.
So change your code to:
import sys
import os
from ruamel import yaml
yaml_str = """\
example: !Env foo
"""
class EnvTag:
yaml_tag = u'!Env'
def __init__(self, env_var):
self.env_var = env_var
def __repr__(self):
return os.environ.get(self.env_var, '')
#staticmethod
def yaml_constructor(loader, node):
return EnvTag(loader.construct_scalar(node))
yaml.add_constructor(EnvTag.yaml_tag, EnvTag.yaml_constructor,
constructor=yaml.SafeConstructor)
data = yaml.safe_load(yaml_str)
print(data)
os.environ['foo'] = 'Hello world!'
print(data)
which gives:
{'example': }
{'example': Hello world!}
Please note that I am using ruamel.yaml (disclaimer: I am the author of that package), so you can use YAML 1.2 (or 1.1) in your YAML file. With minor changes you can do the above with the old PyYAML as well.
You can do this by subclassing of YAMLObject as well, and in a safe way:
import sys
import os
from ruamel import yaml
yaml_str = """\
example: !Env foo
"""
yaml.YAMLObject.yaml_constructor = yaml.SafeConstructor
class EnvTag(yaml.YAMLObject):
yaml_tag = u'!Env'
def __init__(self, env_var):
self.env_var = env_var
def __repr__(self):
return os.environ.get(self.env_var, '')
#classmethod
def from_yaml(cls, loader, node):
return EnvTag(loader.construct_scalar(node))
data = yaml.safe_load(yaml_str)
print(data)
os.environ['foo'] = 'Hello world!'
print(data)
This will give you the same results as above.

How to parse restructuredtext in python?

Is there any module that can parse restructuredtext into a tree model?
Can docutils or sphinx do this?

I'd like to extend upon the answer from Gareth Latty. "What you probably want is the parser at docutils.parsers.rst" is a good starting point of the answer, but what's next? Namely:
How to parse restructuredtext in python?
Below is the exact answer for Python 3.6 and docutils 0.14:
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import docutils.frontend
def parse_rst(text: str) -> docutils.nodes.document:
parser = docutils.parsers.rst.Parser()
components = (docutils.parsers.rst.Parser,)
settings = docutils.frontend.OptionParser(components=components).get_default_values()
document = docutils.utils.new_document('<rst-doc>', settings=settings)
parser.parse(text, document)
return document
And the resulting document can be processed using, for example, below, which will print all references in the document:
class MyVisitor(docutils.nodes.NodeVisitor):
def visit_reference(self, node: docutils.nodes.reference) -> None:
"""Called for "reference" nodes."""
print(node)
def unknown_visit(self, node: docutils.nodes.Node) -> None:
"""Called for all other node types."""
pass
Here's how to run it:
doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)

Docutils does indeed contain the tools to do this.
What you probably want is the parser at docutils.parsers.rst
See this page for details on what is involved. There are also some examples at docutils/examples.py - particularly check out the internals() function, which is probably of interest.

python google closure compiler Source class question

This code is copy from http://code.google.com/p/closure-library/source/browse/trunk/closure/bin/build/source.py
The Source class's __str
__method referred self._path
Is it a special property for self?
Cuz, i couldn't find the place define this variable at Source Class
import re
_BASE_REGEX_STRING = '^\s*goog\.%s\(\s*[\'"](.+)[\'"]\s*\)'
_PROVIDE_REGEX = re.compile(_BASE_REGEX_STRING % 'provide')
_REQUIRES_REGEX = re.compile(_BASE_REGEX_STRING % 'require')
# This line identifies base.js and should match the line in that file.
_GOOG_BASE_LINE = (
'var goog = goog || {}; // Identifies this file as the Closure base.')
class Source(object):
"""Scans a JavaScript source for its provided and required namespaces."""
def __init__(self, source):
"""Initialize a source.
Args:
source: str, The JavaScript source.
"""
self.provides = set()
self.requires = set()
self._source = source
self._ScanSource()
def __str__(self):
return 'Source %s' % self._path #!!!!!! what is self_path !!!!
def GetSource(self):
"""Get the source as a string."""
return self._source
def _ScanSource(self):
"""Fill in provides and requires by scanning the source."""
# TODO: Strip source comments first, as these might be in a comment
# block. RegExes can be borrowed from other projects.
source = self.GetSource()
source_lines = source.splitlines()
for line in source_lines:
match = _PROVIDE_REGEX.match(line)
if match:
self.provides.add(match.group(1))
match = _REQUIRES_REGEX.match(line)
if match:
self.requires.add(match.group(1))
# Closure's base file implicitly provides 'goog'.
for line in source_lines:
if line == _GOOG_BASE_LINE:
if len(self.provides) or len(self.requires):
raise Exception(
'Base files should not provide or require namespaces.')
self.provides.add('goog')
def GetFileContents(path):
"""Get a file's contents as a string.
Args:
path: str, Path to file.
Returns:
str, Contents of file.
Raises:
IOError: An error occurred opening or reading the file.
"""
fileobj = open(path)
try:
return fileobj.read()
finally:
fileobj.close()

No, _path is just an attribute that may or me not be set on an object like any other attribute. The leading underscore simply means that the author felt it was an internal detail of the object and didn't want it regarded as part of the public interface.
In this particular case, unless something is setting the attribute from outside that source file, it looks like it's simply a mistake. It won't do any harm unless anyone ever tries to call str() on a Source object and probably nobody ever does.
BTW, you seem to be thinking there is something special about self. The name self isn't special in any way: it's a convention to use this name for the first parameter of a method, but it is just a name like any other that refers to the object being processed. So if you could access self._path without causing an error you could access it equally well through any other name for the object.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

accessing bookmarks using python-docx - python

I am using the python-docx module to read and edit a .docm file, The file contains bookmarks, how do I access all the bookmarks already stored using that module, there doesnt seem to be any methods within the doc object.

Related

How to create Sphinx role which automatically creates a label for references?

Can I call sphinx.parsers.Parser() directly and parse a fragment of reST?

Creating Custom Tag in PyYAML

How to parse restructuredtext in python?

python google closure compiler Source class question

Categories

Resources