Python Regex match multiline Java annotation - python

I am trying to take advantage of JAXB code generation from a XML Schema to use in an Android project through SimpleXML library, which uses another type of Assertion than JAXB (I do not want to include a 9MB lib tu support JAXB in my Android project). See question previously asked
Basically, I am writing a small Python script to perform the required changes on each Java file generated through the xcj tool, and so far it is working for import deletion/modification, simple line annotation, and also the annotation for which a List #XMLElement needs to be converted to an #ElementList one.
The only issue I am facing right now is for removing annotations on several lines, such as #XMLSeeAlso or #XMLType like the following
#XmlType(name = "AnimatedPictureType", propOrder = {
"resources",
"animation",
"caption"
})
or
#XmlSeeAlso({
BackgroundRGBColorType.class,
ForegroundRGBColorType.class
})
I tried different strategies using either Multineline, DotAll, or both, but without any success. I am new to "advanced" regex usage as well as Python so I am probably missing something silly.
For my simple XSD processing that is the only step I cannot get running to achieve a fully automated script using xcj and then automatically convert JAXB annotations into Simple XML ones.
Thank you in advance for your help.

#Xml.*\}\) with dotall enabled should as far as i know match any annotation starting with #Xml and ending with "})", even when it is multiline.
For a good view of what your regex actually matches you could always test your regular expressions at websites like https://pythex.org/

Related

Why would someone pass a string through an empty function instead of using the string directly in Python?

I am working through some old code for a tKinter GUI and at the top of their code they create the following definition:
def _(text):
return text
The code then proceeds to use the _ function around almost all of the strings being passed to the tKinter widgets. For example:
editmenu.add_command(label=_("Paste"), command=parent.onPaste)
Is there a reason for using the function here as opposed to just passing the string?
I've removed the _ function from around a few of the strings and haven't run into any issues. Just curious if this has a real purpose.
This is a stub for a pattern typically used for internationalization. The pattern is explicitly documented at https://docs.python.org/3/library/gettext.html#deferred-translations
_("Foo") is intended to have _ look up the string Foo in the current user's configured language.
Putting the stub in now -- before any translation tables have actually been built -- makes it easy to search for strings that need to be translated, and avoids folks needing to go through the software figuring out which strings are intended for system-internal vs user-visible uses later. (It also helps guide programmers to avoid making common mistakes like using the same string constants for both user-visible display and save file formats or wire communications).
Understanding this is important, because it guides how this should be used. Don't use _("Save %s".format(name)); instead, use _("Save %s").format(filename) so the folks who are eventually writing translation tables can adjust the format strings, and so those strings don't need to have user-provided content (which obviously can't be known ahead-of-time) within them.

Custom highlighting for string literals in Jupyter

Is there prior art to either
(Simple?) Add a new cell type to Jupyter notebook and apply custom syntax highlighting to all string literals in cells of that type while otherwise treating these cells exactly like regular Python cells. I'd be fine to do the parsing myself, but I'm not sure how one would split the highlighting work between Jupyter and my custom highlighter. I'd be happy to dig around the Jupyter source code to make this work, i.e., I'd accept this not having the form of a regular Jupyter extension in the end.
(Hard?) Apply custom syntax highlighting to all strings with a s prefix (e.g. s'myString') and before executing the cell remove the s while keeping any additional valid python string prefixes like f or u or r)
?
If not, could someone sketch the steps required to make this work? I don't know much JS so I'd prefer Python solutions, but I'm happy to learn the required JS if needed.

Python docstring escaping tabs and universal newlines

I'd appreciate some help on an efficient Pythonic solution for this problem.
Our internal coding standards mandate certain information fields should be in a block comment at the top of the file. In Perl, this was obviously a block of text beginning with '#'.
I'm experimenting with including this information in the module docstring in Python. The problem is I need to access some of this information in the program.
I have surgically extended docstring_parser to recognise the information fields, and create a data structure. This all works.
Except that one of the fields includes the source file location. That's fine on Unix, but we are a cross platform shop, and Windows uses '\' as a path separator. Python decides to process this as universal newlines and tabs, with weird results.
So the string %workspace%\PythonLib\rr2\tests\test_rr2.py
get rendered as:
%workspace%\PythonLib
r2 ests est_rr2.py
which isn't exactly readable anymore.
The fix I have attempted is based on repeated applications of str.replace(), but is there a better way?
#user2357112 is correct. The docstring can be made raw by beginning it with r""", and then everything works.

Programming in Python using a Non-English language for keywords and variables

My end goal is to enable people that know the language Urdu and not English to be able to program in a Python environment.
Urdu is written left to right. I imagine having Urdu versions of all python keywords and using Urdu characters to define variable/function/class names.
Is this goal possible? If yes, then what all will need to be done?
I can already see one issue where the standard library functions would still be in English.
Update:
Whether this should be done or not is certainly a very interesting debate topic, which I'd love to talk about. However, is it actually possible and how?
I love Python and I know a lot of intelligent people that might never be taught English properly, only Urdu, hence the thought.
Chinese Python used to allow programmers to write source code entirely in Chinese, including translated module names, function names, and all keywords. Here's a code example from their website:
載入 系統 # import sys
文件名 = 系統.參數[1:] # filenames = sys.argv[1:]
定義 修正行尾(文件): # def fixline(file):
內文 = 打開(文件).讀入() # text = open(file).read()
內文 = 內文.替換('\n\r','\n') # text = text.replace('\n\r', '\n')
傳回 內文 # return text
取 文件 自 文件名: # for file in filenames:
寫 修正行尾(文件) # print fixline(file)
Sadly, it is no longer an active project, but you can get the source and find out what they changed to get an idea for your own implementation. (Learning why they failed may also help you understand what challenges you will have in making such a system successful).
You can use the ideas project to transform keywords,
it has an example for making a kind of French Python.
It's based on python import-hook so it runs in normal python3.
I built a python in Hebrew (that also uses the python builtins and not just keywords ) using this project ,that looks more like a normal python
Yes, it is possible. But it requires you to download Python from source, and change the source code and compile it. If I remember correctly from another discussion on the same topic, changing the grammar and regenerating some files is all it takes, except for when you are doing advanced stuff like meta-programming and things.
WHAT HAVE YOU TRIED? Python 3 allows you to use Unicode for all your function, class and variable names. Check it out, you just need to make sure you're using utf-8 for your script-- it's primarily a matter for your editor. Dealing gracefully with the mixed RTL/LTR issues is also your editor's problem. (The feature is discussed in PEP 3131)
The python language does not have "dialects", i.e., alternative sets of keywords. If you are not satisfied with Urdu identifiers and you are determined to have a completely Urdu-language experience, you could write a preprocessor that maps Urdu "keywords" to the corresponding (English) python keyword.
It shouldn't be too hard to wrap this kind of preprocessor around the interactive console and import modules, so that it works transparently to the user (and without recompiling python from source). If you can program and want to try this on for size, check out python's code module. It's designed to read python source and send it on to be compiled and executed. You just step in and add a preprocessing step.
I'd use a preprocessor for keywords and built-ins. But providing localized wrappers for your choice of common modules is even easier, actually. Let's demonstrate with a wrapper module RE.py that contains all exported identifiers of regular re, but renamed to upper case:
import re as _re
for name in _re.__all__:
locals()[name.upper()] = _re.__dict__[name]
That was it! You can now import this module and use it:
import RE
docs = RE.SUB(r"\bEnglish\b", r"Urdu", docs)
Use Urdu words instead of uppercase (of course you need to specify each one individually) and you're on your way. As you say, whether all this is a good idea is a different question.

Getting Vim to be aware of ctag type annotations for python

I use Vim+Ctags to write Python, and my problem is that Vim often jumps to the import for a tag, rather than the definition. This is a common issue, and has already been addressed in a few posts here.
this post shows how to remove the imports from the tags file. This works quite well, except that sometimes it is useful to have tags form the imports (e.g. when you want to list all places where a class/function has been imported).
this post shows how to get to the definition without removing the imports from the tags file. This is basically what I've been doing so far (just remapped :tjump to a single keystroke). However, you still need to navigate the list of tags that comes up to find the definition entry.
It would be nice it if it were possible to just tell Vim to "got the the definition" with a single key chord (e.g. ). Exuberant Ctags annotates the tag entries with the type of entry (e.g. c for classes, i for imports). Does anyone know if there is a way to get Vim to utilize these annotations, so that I could say things like "go to the first tag that is not of type i"?
Unfortunately, there's no way for Vim itself to do that inference business and jump to an import or a definition depending on some context: when searching for a tag in your tags file, Vim stops at the first match whatever it is. A plugin may help but I'm not aware of such a thing.
Instead of <C-]> or :tag foo, you could use g] or :ts foo which shows you a list of matches (with kinds and a preview of the line of each match) instead of jumping to the first one. This way, you are able to tell Vim exactly where you want to go.

Categories