Custom highlighting for string literals in Jupyter - python

Is there prior art to either
(Simple?) Add a new cell type to Jupyter notebook and apply custom syntax highlighting to all string literals in cells of that type while otherwise treating these cells exactly like regular Python cells. I'd be fine to do the parsing myself, but I'm not sure how one would split the highlighting work between Jupyter and my custom highlighter. I'd be happy to dig around the Jupyter source code to make this work, i.e., I'd accept this not having the form of a regular Jupyter extension in the end.
(Hard?) Apply custom syntax highlighting to all strings with a s prefix (e.g. s'myString') and before executing the cell remove the s while keeping any additional valid python string prefixes like f or u or r)
?
If not, could someone sketch the steps required to make this work? I don't know much JS so I'd prefer Python solutions, but I'm happy to learn the required JS if needed.

Related

Why would someone pass a string through an empty function instead of using the string directly in Python?

I am working through some old code for a tKinter GUI and at the top of their code they create the following definition:
def _(text):
return text
The code then proceeds to use the _ function around almost all of the strings being passed to the tKinter widgets. For example:
editmenu.add_command(label=_("Paste"), command=parent.onPaste)
Is there a reason for using the function here as opposed to just passing the string?
I've removed the _ function from around a few of the strings and haven't run into any issues. Just curious if this has a real purpose.
This is a stub for a pattern typically used for internationalization. The pattern is explicitly documented at https://docs.python.org/3/library/gettext.html#deferred-translations
_("Foo") is intended to have _ look up the string Foo in the current user's configured language.
Putting the stub in now -- before any translation tables have actually been built -- makes it easy to search for strings that need to be translated, and avoids folks needing to go through the software figuring out which strings are intended for system-internal vs user-visible uses later. (It also helps guide programmers to avoid making common mistakes like using the same string constants for both user-visible display and save file formats or wire communications).
Understanding this is important, because it guides how this should be used. Don't use _("Save %s".format(name)); instead, use _("Save %s").format(filename) so the folks who are eventually writing translation tables can adjust the format strings, and so those strings don't need to have user-provided content (which obviously can't be known ahead-of-time) within them.

Python docstring escaping tabs and universal newlines

I'd appreciate some help on an efficient Pythonic solution for this problem.
Our internal coding standards mandate certain information fields should be in a block comment at the top of the file. In Perl, this was obviously a block of text beginning with '#'.
I'm experimenting with including this information in the module docstring in Python. The problem is I need to access some of this information in the program.
I have surgically extended docstring_parser to recognise the information fields, and create a data structure. This all works.
Except that one of the fields includes the source file location. That's fine on Unix, but we are a cross platform shop, and Windows uses '\' as a path separator. Python decides to process this as universal newlines and tabs, with weird results.
So the string %workspace%\PythonLib\rr2\tests\test_rr2.py
get rendered as:
%workspace%\PythonLib
r2 ests est_rr2.py
which isn't exactly readable anymore.
The fix I have attempted is based on repeated applications of str.replace(), but is there a better way?
#user2357112 is correct. The docstring can be made raw by beginning it with r""", and then everything works.

IPython change input cell syntax highlighting logic for entire session

You can use extensions or display helpers in IPython to make whatever syntax highlighting you'd like on output cells.
For some special cell magics, like %%javascript you can also see the input cell itself is rendered with that language's natural syntax highlighting.
How can you cause every input cell to be displayed with some chosen, non-Python syntax highlighting (regardless of any magics used on a cell, regardless of whether the cell embodies Python code, some other language).
In my case I am working with a custom-made cell magic for a proprietary language. The %%javascript syntax highlighting works well for this language, but if I have my own %%proprietarylang magic function, I obviously can't use %%javascript to help me with how the cell is displayed.
Is there a setting I can enable when I launch the notebook, or some property of the ipython object itself that can be programmatically set inside of my magic function, which will cause the same display logic to happen as if it was %%javascript.
I know that general-purpose on-the-fly syntax highlighting is not supported by the notebook currently. I'm asking specifically about how to make use of pre-existing syntax highlighting effects, such as that of %%javascript.
I've seen some documentation referring to IPython.config.cell_magic_highlight but this does not seem to exist anymore. Is there a standard replacement for it?
To replace IPython.config.cell_magic_highlight, you can use something like
import IPython
js = "IPython.CodeCell.config_defaults.highlight_modes['magic_fortran'] = {'reg':[/^%%fortran/]};"
IPython.core.display.display_javascript(js, raw=True)
so cells which begin with %%fortran will be syntax-highlighted like FORTRAN. (However, they will still be evaluated as python if you do only this.)
For recent IPython version, the selected answer no longer works. The 'config_default' property was renamed options_default (Ipython 6.0.0).
The following works:
import IPython
js = "IPython.CodeCell.options_default.highlight_modes['magic_fortran'] = {'reg':[/^%%fortran/]};"
IPython.core.display.display_javascript(js, raw=True)

Python Regex match multiline Java annotation

I am trying to take advantage of JAXB code generation from a XML Schema to use in an Android project through SimpleXML library, which uses another type of Assertion than JAXB (I do not want to include a 9MB lib tu support JAXB in my Android project). See question previously asked
Basically, I am writing a small Python script to perform the required changes on each Java file generated through the xcj tool, and so far it is working for import deletion/modification, simple line annotation, and also the annotation for which a List #XMLElement needs to be converted to an #ElementList one.
The only issue I am facing right now is for removing annotations on several lines, such as #XMLSeeAlso or #XMLType like the following
#XmlType(name = "AnimatedPictureType", propOrder = {
"resources",
"animation",
"caption"
})
or
#XmlSeeAlso({
BackgroundRGBColorType.class,
ForegroundRGBColorType.class
})
I tried different strategies using either Multineline, DotAll, or both, but without any success. I am new to "advanced" regex usage as well as Python so I am probably missing something silly.
For my simple XSD processing that is the only step I cannot get running to achieve a fully automated script using xcj and then automatically convert JAXB annotations into Simple XML ones.
Thank you in advance for your help.
#Xml.*\}\) with dotall enabled should as far as i know match any annotation starting with #Xml and ending with "})", even when it is multiline.
For a good view of what your regex actually matches you could always test your regular expressions at websites like https://pythex.org/

Escaping dollar sign in ipython notebook

I have a markdown cell in iPython that contains four dollar signs. iPython interprets anything between dollar signs as a MathJax expression, which is not what I want. How do I escape the dollar signs? Escaping them with a backslash prevents MathJax from kicking in, but the backslash shows in the compiled Markdown.
ANy ideas on how to get just the dollar sign?
Thanks
Put two backslashes in front of dollar signs. For example:
Some prices: \\$3.10, \\$4.25, \\$8.50.
(running Jupyter notebook server 5.7.0)
You can escape $ with math mode by using a backslash. Try $\$$
If you use <span>$</span>, MathJax won't process it as a delimiter. You should be able to enter that in Markdown. For example, I've used that here: $ This is not math $.
I was not able to get most of these solutions working. One that did work, though, is explained here. It refers to a SO question here and is as simple as:
<span class="tex2jax_ignore">$900 vs $4,500</span>
Hope this helps someone!
I'm aware that this topic is old, but it's still somehow the first google result and its answers are incomplete.
You can also surround the $ with `backticks`, the same way that you would display code in Jupyter.
So $ becomes `$`, and should display without error
Did you try using the equivalent HTML entity instead?:
e.g.
Old post, but it's still at the top of Google! Here is a clarification I wish I had understood sooner:
Jupter Notebook Markdown
Jupyter Notebook "Markdown" cells allow you to write HTML in them, and will render the HTML as expected in most cases. This is different from some other Markdown editors where you can only use a very limited subset of HTML, or no HTML at all.
Why This Matters for the $ Literal
A very helpful aspect to this is when you want to use the literal version of a special character such as $. Meaning, you want a $ to appear in the rendered version of the cell, rather than using $ to start a MathJax block.
This goes beyond $ to other special characters. Maybe you want a * to appear in the rendered cell, rather than indicating a bullet point or bold/italic text. I'm having a harder time imagining the use case for # or _ or [ but there might be some reason you would want to write those literals as well (without surrounding them with backticks, causing them to be styled as inline code blocks like I have here).
HTML Approach 1: Create an HTML Block
A couple of the other answers describe surrounding the $ with an HTML <span> tag. What this is doing (if you don't include a newline in the middle like this GitHub comment shows) is telling Jupyter Notebook not to interpret the content as Markdown any more, and instead to interpret it as HTML. In HTML, $ doesn't mean that you're starting a MathJax block, so it will render literally.
Personally I don't like this approach because it's hard to tell what problem the span is solving, and it's using span as more of a hack/workaround than how it is intended to be used in the HTML spec.
HTML Approach 2: HTML Character Encoding
This is what the $ answer above is suggesting, although I don't think it's fully explained.
The idea here is that instead of putting the $ literal in your Markdown source, you put the HTML character code version $ instead. Here is a longer explanation of these character codes, if you're curious. But essentially all you need to know is that if you write the relevant character code when editing the source of the Markdown cell, it won't be interpreted as Markdown, but it will render correctly.
So instead of this Markdown source:
The price range is $100 to $200
(which will render 100 to as MathJax)
Simply replace the $s with $s:
The price range is $100 to $200
(which will render the dollar signs literally)
I prefer this approach because the entire purpose of HTML character encoding is escaping special characters, so if someone in the future is trying to understand what your code is doing, they should be able to figure it out pretty quickly.
I also just in general try to avoid HTML opening and closing tags within my Jupyter Notebook Markdown since small typos (like forgetting a /) can have confusing cascading implications, but your mileage may vary :)
If you're looking for character codes other than $, I like this resource, although you can also just Google "HTML code" plus the character you want and get the right answer most of the time.

Categories