Convert reStructuredText to plain text programmatically in Python

Convert reStructuredText to plain text programmatically in Python - python

Say I have some reStructuredText source in a string
source = """
============
Introduction
============
Hello world.
.. code-block:: bash
$ echo Greetings.
"""
import sys
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import sphinx.writers.text
import sphinx.builders.text
def parse_rst(text: str) -> docutils.nodes.document:
parser = docutils.parsers.rst.Parser()
components = (docutils.parsers.rst.Parser,)
settings = docutils.frontend.OptionParser(components=components).get_default_values()
document = docutils.utils.new_document('<rst-doc>', settings=settings)
parser.parse(text, document)
return document
if __name__ == '__main__':
document = parse_rst(source)
I'd like to convert it into plain text without the reST markup using Python.
I tried to use sphinx.builders.text.TextBuilder but it seems to want an App object, not a string.
Here is a related question about doing it manually on the command-line with files instead of strings.
Parsing code comes from this answer

This code works. It has some hacks like setting a fake config dir, maybe there's a better way.
import sys
import textwrap
import types
import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import sphinx.writers.text
import sphinx.builders.text
import sphinx.util.osutil
def parse_rst(text: str) -> docutils.nodes.document:
parser = docutils.parsers.rst.Parser()
components = (docutils.parsers.rst.Parser,)
settings = docutils.frontend.OptionParser(
components=components
).get_default_values()
document = docutils.utils.new_document("<rst-doc>", settings=settings)
parser.parse(text, document)
return document
if __name__ == "__main__":
source = textwrap.dedent(
"""\
============
Introduction
============
Hello world.
.. code-block:: bash
$ echo Greetings.
"""
)
document = parse_rst(source)
app = types.SimpleNamespace(
srcdir=None,
confdir=None,
outdir=None,
doctreedir="/",
config=types.SimpleNamespace(
text_newlines="native",
text_sectionchars="=",
text_add_secnumbers=False,
text_secnumber_suffix=".",
),
tags=set(),
registry=types.SimpleNamespace(
create_translator=lambda self, something, new_builder: sphinx.writers.text.TextTranslator(
document, new_builder
)
),
)
builder = sphinx.builders.text.TextBuilder(app)
translator = sphinx.writers.text.TextTranslator(document, builder)
document.walkabout(translator)
print(translator.body)
Output:
Introduction
============
Hello world.
$ echo Greetings.

Sphinx comes with a TextBuilder. From the command line:
make text

Related

How to access a docstring from a separate script?

Building a GUI for users to select Python scripts they want to run. Each script has its own docstring explaining inputs and outputs for the script. I want to display that information in the UI once they've highlighted the script, but not selected to run it, and I can't seem to get access to the docstrings from the base program.
ex.
test.py
"""this is a docstring"""
print('hello world')
program.py
index is test.py for this example, but is normally not known because it's whatever the user has selected in the GUI.
# index is test.py
def on_selected(self, index):
script_path = self.tree_view_model.filePath(index)
fparse = ast.parse(''.join(open(script_path)))
self.textBrowser_description.setPlainText(ast.get_docstring(fparse))

Let's the docstring you want to access belongs to the file, file.py.
You can get the docstring by doing the following:
import file
print(file.__doc__)
If you want to get the docstring before you import it then the you could read the file and extract the docstring. Here is an example:
import re
def get_docstring(file)
with open(file, "r") as f:
content = f.read() # read file
quote = content[0] # get type of quote
pattern = re.compile(rf"^{quote}{quote}{quote}[^{quote}]*{quote}{quote}{quote}") # create docstring pattern
return re.findall(pattern, content)[0][3:-3] # return docstring without quotes
print(get_docstring("file.py"))
Note: For this regex to work the docstring will need to be at the very top.

Here's how to get it via importlib. Most of the logic has been put in a function. Note that using importlib does import the script (which causes all its top-level statements to be executed), but the module itself is discarded when the function returns.
If this was the script docstring_test.py in the current directory that I wanted to get the docstring from:
""" this is a multiline
docstring.
"""
print('hello world')
Here's how to do it:
import importlib.util
def get_docstring(script_name, script_path):
spec = importlib.util.spec_from_file_location(script_name, script_path)
foo = importlib.util.module_from_spec(spec)
spec.loader.exec_module(foo)
return foo.__doc__
if __name__ == '__main__':
print(get_docstring('docstring_test', "./docstring_test.py"))
Output:
hello world
this is a multiline
docstring.
Update:
Here's how to do it by letting the ast module in the standard library do the parsing which avoids both importing/executing the script as well as trying to parse it yourself with a regex.
This looks more-or-less equivalent to what's in your question, so it's unclear why what you have isn't working for you.
import ast
def get_docstring(script_path):
with open(script_path, 'r') as file:
tree = ast.parse(file.read())
return ast.get_docstring(tree, clean=False)
if __name__ == '__main__':
print(repr(get_docstring('./docstring_test.py')))
Output:
' this is a multiline\n docstring.\n'

ANSI colour text not displayed in pytest-html report

In pytest html report, the ANSI colour text is not displayed correctly. But in console, I can see the output with out any issue.Please see my conftest.py and let me know if I have to make any changes to be displayed correctly.
from datetime import datetime
from py.xml import html
import pytest
import json
import globals
from Process.RunProcess import RunProcess
from os import path
import sys
from ansi2html import Ansi2HTMLConverter
from ansi2html.converter import main, \
ANSI_VISIBILITY_ON, ANSI_VISIBILITY_OFF, \
ANSI_BLINK_SLOW, ANSI_BLINK_FAST, ANSI_BLINK_OFF, \
ANSI_NEGATIVE_ON, ANSI_NEGATIVE_OFF, \
ANSI_INTENSITY_INCREASED, ANSI_INTENSITY_REDUCED, ANSI_INTENSITY_NORMAL
from ansi2html.util import read_to_unicode
#pytest.mark.optionalhook
def pytest_html_results_table_header(cells):
# cells.insert(2, html.th('Status_code'))
cells.insert(1, html.th('Time', class_='sortable time', col='time'))
cells.pop()
#pytest.mark.optionalhook
def pytest_html_results_table_row(report, cells):
# cells.insert(2, html.td(report.status_code))
cells.insert(1, html.td(datetime.utcnow(), class_='col-time'))
cells.pop()
#pytest.mark.hookwrapper
def pytest_runtest_makereport(item, call):
outcome = yield
# Ansi2HTMLConverter(linkify=True).convert(outcome.get_result())
report = outcome.get_result()
# report.status_code = str(item.function)
please see the difference of Console out put and html report from the attached images.[

For me, it works right after I installed the required dependency ansi2html as described in https://github.com/pytest-dev/pytest-html#ansi-codes
(without using the Ansi2HTMLConverter). However, I don't implement the pytest_runtest_makereport hook.

Script doesn't work on Linux

I made a small discord bot in python. On windows it works perfectly fine, but when I try to run it on raspbain, it says invalid syntax (with the command "python3 Bot.py")
Here's the code:
import feedparser
from yaml import load, dump
from json import dumps as jdump
from requests import post
import xml.etree.ElementTree as ET
BASE_URL = "https://discordapp.com/api"
def get_from_summary(summary):
root = ET.fromstring(f"<element>{summary}</element>")
d = f"{root[1].text}\n\n{root[2].text}"
i = root[0].attrib["src"]
return (d, i)
The syntax is at root = ET.fromstring(f"<element>{summary}</element>") with the "

The code uses formatted string literals (the f"<element>{summary}</element>"), which were only introduced in Python 3.6, so you need to use at least that version of Python.

Python, create shortcut with two paths and argument

I'm trying to create a shortcut through python that will launch a file in another program with an argument. E.g:
"C:\file.exe" "C:\folder\file.ext" argument
The code I've tried messing with:
from win32com.client import Dispatch
import os
shell = Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut(path)
shortcut.Targetpath = r'"C:\file.exe" "C:\folder\file.ext"'
shortcut.Arguments = argument
shortcut.WorkingDirectory = "C:\" #or "C:\folder\file.ext" in this case?
shortcut.save()
But i get an error thrown my way:
AttributeError: Property '<unknown>.Targetpath' can not be set.
I've tried different formats of the string and google doesn't seem to know the solution to this problem

from comtypes.client import CreateObject
from comtypes.gen import IWshRuntimeLibrary
shell = CreateObject("WScript.Shell")
shortcut = shell.CreateShortCut(path).QueryInterface(IWshRuntimeLibrary.IWshShortcut)
shortcut.TargetPath = "C:\file.exe"
args = ["C:\folder\file.ext", argument]
shortcut.Arguments = " ".join(args)
shortcut.Save()
Reference

Here is how to do it on Python 3.6 (the second import of #wombatonfire s solution is not found any more).
First i did pip install comtypes, then:
import comtypes
from comtypes.client import CreateObject
from comtypes.persist import IPersistFile
from comtypes.shelllink import ShellLink
# Create a link
s = CreateObject(ShellLink)
s.SetPath('C:\\myfile.txt')
# s.SetArguments('arg1 arg2 arg3')
# s.SetWorkingDirectory('C:\\')
# s.SetIconLocation('path\\to\\.exe\\or\\.ico\\file', 1)
# s.SetDescription('bla bla bla')
# s.Hotkey=1601
# s.ShowCMD=1
p = s.QueryInterface(IPersistFile)
p.Save("C:\\link to myfile.lnk", True)
# Read information from a link
s = CreateObject(ShellLink)
p = s.QueryInterface(IPersistFile)
p.Load("C:\\link to myfile.lnk", True)
print(s.GetPath())
# print(s.GetArguments())
# print(s.GetWorkingDirectory())
# print(s.GetIconLocation())
# print(s.GetDescription())
# print(s.Hotkey)
# print(s.ShowCmd)
see site-packages/comtypes/shelllink.py for more info.

Problems using the Python interface of the Berkeley Parser

I am using the berkeley parser's interface in Python. I want to use the parser by having the input as a string and not a file. In this document, the usage is explained: https://github.com/emcnany/berkeleyinterface/blob/master/examples/example.py
Here is the documentation for the interface
https://github.com/emcnany/berkeleyinterface/blob/master/BerkeleyInterface/berkeleyinterface.py
I am following that guide but when I'm running the code below, nothing happens after reaching the last line and the code never finishes.
import os
from BerkeleyInterface import *
from StringIO import StringIO
JAR_PATH = r'C:\berkeleyparser\parser.jar'
GRM_PATH = r'C:\berkeleyparser\english.gr'
cp = os.environ.get("BERKELEY_PARSER_JAR", JAR_PATH)
gr = os.environ.get("BERKELEY_PARSER_GRM", GRM_PATH)
startup(cp)
args = {"gr":gr, "tokenize":True}
opts = getOpts(dictToArgs(args))
parser = loadGrammar(opts)
print("parser loaded")
strIn = StringIO("hello world how are you today")
strOut = StringIO()
parseInput(parser, opts, outputFile=strOut)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert reStructuredText to plain text programmatically in Python - python

Sphinx comes with a TextBuilder. From the command line: make text

Related

How to access a docstring from a separate script?

ANSI colour text not displayed in pytest-html report

Script doesn't work on Linux

Python, create shortcut with two paths and argument

Problems using the Python interface of the Berkeley Parser

Categories

Resources