Ultisnips python interpolation snippet, extracting number from filename - python

I am trying to make a snippet that will help me choose the right revision
number for migration, by reading all migration files from
application/migrations.
What I managed to do myself is that my filenames are being filtered while I am
typing, and when only one match left insert its revision number at the cursor
position (which are first 14 chars of filename always).
The problem is that when I hit TAB to select, I am also left with what I have
typed so far to search for that revision number, meaning something like this
remo20160812110447.
Question is, how to get rid of that remo in this case!?
NOTE: Example uses hardcoded values, for easier testing, those will be later
replaced by # lst = os.listdir('application/migrations') line.
Also an added bonus effect would be to present those 20160710171947 values as
human readable date format while choosing, but after hitting TAB insert their
original source version.
global !p
import datetime
def complete(t, opts):
if t:
opts = [ m for m in opts if t in m ]
if len(opts) == 1:
return opts[0][:14]
return "(" + '|'.join(opts) + ')'
endglobal
snippet cimigration "Inserts desired migration number, obtained via filenames"
$1`!p import os
# lst = os.listdir('application/migrations')
lst = [
'20160710171947_create.php',
'20160810112347_delete.php',
'20160812110447_remove.php'
]
snip.rv = complete(t[1], lst)`
endsnippet

This can definitely be performed in pure vimscript.
Here is a working prototype. It does work but has some issues with portability: global variables, reliance on iskeyword and uses two keybindings instead of one. But it was put together in an hour or so:
set iskeyword=#,48-57,_,-,.,192-255
let g:wordidx = 0
let g:word = ''
let g:match = 0
function! Suggest()
let l:glob = globpath('application/migrations', '*.php')
let l:files = map(split(l:glob), 'fnamemodify(v:val, ":t")')
let l:char = getline('.')[col('.')-1]
let l:word = ''
let l:suggestions = []
if l:char =~# '[a-zA-Z0-9_]'
if g:word ==# ''
let g:word = expand('<cword>')
let g:match = matchadd('ErrorMsg', g:word)
endif
let l:word = g:word
"let l:reg = '^' . l:word
let l:suggestions = filter(l:files, 'v:val =~ l:word')
if !empty(l:suggestions)
call add(l:suggestions, l:word)
"echo l:suggestions
let l:change = l:suggestions[g:wordidx]
let g:wordidx = (g:wordidx + 1) % len(l:suggestions)
"echo g:wordidx + 10
execute "normal! mqviwc" . l:change . "\<esc>`q"
endif
endif
"echo [l:word, l:suggestions]
endfunction
function! SuggestClear()
call matchdelete(g:match)
let g:wordidx = 0
let g:word = ''
let g:match = 0
endfunction
nnoremap <leader><tab> :call Suggest()<cr>
nnoremap <leader><cr> :call SuggestClear()<cr>
Adding this to your ~/.vimrc will allow you to steps through search matches with <leader><tab>. It will highlight the part that is being matched, to drop the highlight you need to type <leader><cr>.
You should always drop the highlight after use because the original search word is kept internally until you destroy it. Using <leader><tab> before clearing the match will substitute for the suggestions from the previous match.
Screencast (my leader is -):
If you have more vim questions join the vi.SE subsection of the website. You can probably get better answers there.

This can be achieved using post-expand-actions: https://github.com/SirVer/ultisnips/blob/master/doc/UltiSnips.txt#L1602

Related

Is there a way to insert around a character in a regex match?

Im converting .m to .py and i want to change the following.
for ii = 1:r
xt(i) = (fr*2-(i+fr-1))
end
to
for ii in range(1,r):
xt(i) = (fr*2-(i+fr-1))
So far I got this but cant ignore replacement of ii and r
Replace: for [a-z]{1,} = 1:
With: for ??? in range(1,???):
Is this even possible?
in case someone comes over this later, I got this to work with the following
(\b(for)) (\b[a-z]{1,}) (\B=) (\d{1,})(\b:)(\b[a-z]{1,})
replace with
$2 $3 in range($5,$7):

Bell character as Fields separator in Python print output

I am fairly new to Python and need a little help here.
I have a Python script running on Python 2.6 that parses some JSON.
Example Code:
if "prid" in data[p]["prdts"][n]:
print data[p]["products"][n]["prid"],
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
print data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"]
break
Now, this prints values in 2 columns:
prid price
123 20
234 40
As you see the fields separator above is ' '. How can I put a field separator like BEL character in the output?
Sample expected output:
prid price
123^G20
234^G40
FWIW, your while loop doesn't increment i, so it will loop forever, but I assume that was just a copy & paste error, and I'll ignore it in the rest of my answer.
If you want to use two separate print statements to print your data on one line you can't avoid getting that space produced by the first print statement. Instead, simply save the prid data until you can print it with the price in one go using string concatenation. Eg,
if "prid" in data[p]["prdts"][n]:
prid = [data[p]["products"][n]["prid"]]
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
price = data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"]
print str(prid) + '\a' + str(price)
break
Note that I'm explicitly converting the prid and price to string. Obviously, if either of those items is already a string then you don't need to wrap it in str(). Normally, we can let print convert objects to string for us, but we can't do
print prid, '\a', price
here because that will give us an unwanted space between each item.
Another approach is to make use of the new print() function, which we can import using a __future__ import at the top of the script, before other imports:
from __future__ import print_function
# ...
if "prid" in data[p]["prdts"][n]:
print(data[p]["products"][n]["prid"], end='\a')
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
print(data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"])
break
I don't understand why you want to use BEL as a separator rather than something more conventional, eg TAB. The BEL char may print as ^G in your terminal, but it's invisible in mine, and if you save this output to a text file it may not display correctly in a text viewer / editor.
BTW, It would have been better if you posted a Minimal, Complete, Verifiable Example that focuses on your actual printing problem, rather than all that crazy JSON parsing stuff, which just makes your question look more complicated than it really is, and makes it impossible to test your code or their modifications to it.

Vim obtain string between visual selection range with Python

Here is some text
here is line two of text
I visually select from is to is in Vim: (brackets represent the visual selection [ ])
Here [is some text
here is] line two of text
Using Python, I can obtain the range tuples of the selection:
function! GetRange()
python << EOF
import vim
buf = vim.current.buffer # the buffer
start = buf.mark('<') # start selection tuple: (1,5)
end = buf.mark('>') # end selection tuple: (2,7)
EOF
endfunction
I source this file: :so %, select the text visually, run :<,'>call GetRange() and
now that I have (1,5) and (2,7). In Python, how can I compile the string that is the following:
is some text\nhere is
Would be nice to:
Obtain this string for future manipulation
then replace this selected range with the updated/manipulated string
Try this:
fun! GetRange()
python << EOF
import vim
buf = vim.current.buffer
(lnum1, col1) = buf.mark('<')
(lnum2, col2) = buf.mark('>')
lines = vim.eval('getline({}, {})'.format(lnum1, lnum2))
lines[0] = lines[0][col1:]
lines[-1] = lines[-1][:col2]
print "\n".join(lines)
EOF
endfun
You can use vim.eval to get python values of vim functions and variables.
This would probably work if you used pure vimscript
function! GetRange()
let #" = substitute(#", '\n', '\\n', 'g')
endfunction
vnoremap ,r y:call GetRange()<CR>gvp
This will convert all newlines into \n in the visual selection and replace the selection with that string.
This mapping yanks the selection into the " register. Calls the function (isn't really necessary since its only one command). Then uses gv to reselect the visual selection and then pastes the quote register back onto the selected region.
Note: in vimscript all user defined functions must start with an Uppercase letter.
Here's another version based on Conner's answer. I took qed's suggestion and also added a fix for when the selection is entirely within one line.
import vim
def GetRange():
buf = vim.current.buffer
(lnum1, col1) = buf.mark('<')
(lnum2, col2) = buf.mark('>')
lines = vim.eval('getline({}, {})'.format(lnum1, lnum2))
if len(lines) == 1:
lines[0] = lines[0][col1:col2 + 1]
else:
lines[0] = lines[0][col1:]
lines[-1] = lines[-1][:col2 + 1]
return "\n".join(lines)

difflib python formatting

I am using this code to find difference between two csv list and hove some formatting questions. This is probably an easy fix, but I am new and trying to learn and having alot of problems.
import difflib
diff=difflib.ndiff(open('test1.csv',"rb").readlines(), open('test2.csv',"rb").readlines())
try:
while 1:
print diff.next(),
except:
pass
the code works fine and I get the output I am looking for as:
Group,Symbol,Total
- Adam,apple,3850
? ^
+ Adam,apple,2850
? ^
bob,orange,-45
bob,lemon,66
bob,appl,-56
bob,,88
My question is how do I clean the formatting up, can I make the Group,Symbol,Total into sperate columns, and the line up the text below?
Also can i change the ? to represent a text I determine? such as test 1 and test 2 representing which sheet it comes from?
thanks for any help
Using difflib.unified_diff gives much cleaner output, see below.
Also, both difflib.ndiff and difflib.unified_diff return a Differ object that is a generator object, which you can directly use in a for loop, and that knows when to quit, so you don't have to handle exceptions yourself. N.B; The comma after line is to prevent print from adding another newline.
import difflib
s1 = ['Adam,apple,3850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
s2 = ['Adam,apple,2850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
for line in difflib.unified_diff(s1, s2, fromfile='test1.csv',
tofile='test2.csv'):
print line,
This gives:
--- test1.csv
+++ test2.csv
## -1,4 +1,4 ##
-Adam,apple,3850
+Adam,apple,2850
bob,orange,-45
bob,lemon,66
bob,appl,-56
So you can clearly see which lines were changed between test1.csv and test1.csv.
To line up the columns, you must use string formatting.
E.g. print "%-20s %-20s %-20s" % (row[0],row[1],row[2]).
To change the ? into any text test you like, you'd use s.replace('any text i like').
Your problem has more to do with the CSV format, since difflib has no idea it's looking at columnar fields. What you need is to figure out into which field the guide is pointing, so that you can adjust it when printing the columns.
If your CSV files are simple, i.e. they don't contain any quoted fields with embedded commas or (shudder) newlines, you can just use split(',') to separate them into fields, and figure out where the guide points as follows:
def align(line, guideline):
"""
Figure out which field the guide (^) points to, and the offset within it.
E.g., if the guide points 3 chars into field 2, return (2, 3)
"""
fields = line.split(',')
guide = guideline.index('^')
f = p = 0
while p + len(fields[f]) < guide:
p += len(fields[f]) + 1 # +1 for the comma
f += 1
offset = guide - p
return f, offset
Now it's easy to show the guide properly. Let's say you want to align your columns by printing everything 12 spaces wide:
diff=difflib.ndiff(...)
for line in diff:
code = line[0] # The diff prefix
print code,
if code == '?':
fld, offset = align(lastline, line[2:])
for f in range(fld):
print "%-12s" % '',
print ' '*offset + '^'
else:
fields = line[2:].rstrip('\r\n').split(',')
for f in fields:
print "%-12s" % f,
print
lastline = line[2:]
Be warned that the only reliable way to parse CSV files is to use the csv module (or a robust alternative); but getting it to play well with the diff format (in full generality) would be a bit of a headache. If you're mainly interested in readability and your CSV isn't too gnarly, you can probably live with an occasional mix-up.

Can’t fix pyparsing error…

Overview
So, I’m in the middle of refactoring a project, and I’m separating out a bunch of parsing code. The code I’m concerned with is pyparsing.
I have a very poor understanding of pyparsing, even after spending a lot of time reading through the official documentation. I’m having trouble because (1) pyparsing takes a (deliberately) unorthodox approach to parsing, and (2) I’m working on code I didn’t write, with poor comments, and a non-elementary set of existing grammars.
(I can’t get in touch with the original author, either.)
Failing Test
I’m using PyVows to test my code. One of my tests is as follows (I think this is clear even if you’re unfamiliar with PyVows; let me know if it isn’t):
def test_multiline_command_ends(self, topic):
output = parsed_input('multiline command ends\n\n',topic)
expect(output).to_equal(
r'''['multiline', 'command ends', '\n', '\n']
- args: command ends
- multiline_command: multiline
- statement: ['multiline', 'command ends', '\n', '\n']
- args: command ends
- multiline_command: multiline
- terminator: ['\n', '\n']
- terminator: ['\n', '\n']''')
But when I run the test, I get the following in the terminal:
Failed Test Results
Expected topic("['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline")
to equal "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"
Note:
Since the output is to a Terminal, the expected output (the second one) has extra backslashes. This is normal. The test ran without issue before this piece of refactoring began.
Expected Behavior
The first line of output should match the second, but it doesn’t. Specifically, it’s not including the two newline characters in that first list object.
So I’m getting this:
"['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline"
When I should be getting this:
"['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"
Earlier in the code, there is also this statement:
pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')
…Which I think should prevent exactly this kind of error. But I’m not sure.
Even if the problem can’t be identified with certainty, simply narrowing down where the problem is would be a HUGE help.
Please let me know how I might take a step or two towards fixing this.
Edit: So, uh, I should post the parser code for this, shouldn’t I? (Thanks for the tip, #andrew cooke !)
Parser code
Here’s the __init__ for my parser object.
I know it’s a nightmare. That’s why I’m refactoring the project. ☺
def __init__(self, Cmd_object=None, *args, **kwargs):
# #NOTE
# This is one of the biggest pain points of the existing code.
# To aid in readability, I CAPITALIZED all variables that are
# not set on `self`.
#
# That means that CAPITALIZED variables aren't
# used outside of this method.
#
# Doing this has allowed me to more easily read what
# variables become a part of other variables during the
# building-up of the various parsers.
#
# I realize the capitalized variables is unorthodox
# and potentially anti-convention. But after reaching out
# to the project's creator several times over roughly 5
# months, I'm still working on this project alone...
# And without help, this is the only way I can move forward.
#
# I have a very poor understanding of the parser's
# control flow when the user types a command and hits ENTER,
# and until the author (or another pyparsing expert)
# explains what's happening to me, I have to do silly
# things like this. :-|
#
# Of course, if the impossible happens and this code
# gets cleaned up, then the variables will be restored to
# proper capitalization.
#
# —Zearin
# http://github.com/zearin/
# 2012 Mar 26
if Cmd_object is not None:
self.Cmd_object = Cmd_object
else:
raise Exception('Cmd_object be provided to Parser.__init__().')
# #FIXME
# Refactor methods into this class later
preparse = self.Cmd_object.preparse
postparse = self.Cmd_object.postparse
self._allow_blank_lines = False
self.abbrev = True # Recognize abbreviated commands
self.case_insensitive = True # Commands recognized regardless of case
# make sure your terminators are not in legal_chars!
self.legal_chars = u'!#$%.:?#_' + PYP.alphanums + PYP.alphas8bit
self.multiln_commands = [] if 'multiline_commands' not in kwargs else kwargs['multiln_commands']
self.no_special_parse = {'ed','edit','exit','set'}
self.redirector = '>' # for sending output to file
self.reserved_words = []
self.shortcuts = { '?' : 'help' ,
'!' : 'shell',
'#' : 'load' ,
'##': '_relative_load'
}
# self._init_grammars()
#
# def _init_grammars(self):
# #FIXME
# Add Docstring
# ----------------------------
# Tell PYP how to parse
# file input from '< filename'
# ----------------------------
FILENAME = PYP.Word(self.legal_chars + '/\\')
INPUT_MARK = PYP.Literal('<')
INPUT_MARK.setParseAction(lambda x: '')
INPUT_FROM = FILENAME('INPUT_FROM')
INPUT_FROM.setParseAction( self.Cmd_object.replace_with_file_contents )
# ----------------------------
#OUTPUT_PARSER = (PYP.Literal('>>') | (PYP.WordStart() + '>') | PYP.Regex('[^=]>'))('output')
OUTPUT_PARSER = (PYP.Literal( 2 * self.redirector) | \
(PYP.WordStart() + self.redirector) | \
PYP.Regex('[^=]' + self.redirector))('output')
PIPE = PYP.Keyword('|', identChars='|')
STRING_END = PYP.stringEnd ^ '\nEOF'
TERMINATORS = [';']
TERMINATOR_PARSER = PYP.Or([
(hasattr(t, 'parseString') and t)
or
PYP.Literal(t) for t in TERMINATORS
])('terminator')
self.comment_grammars = PYP.Or([ PYP.pythonStyleComment,
PYP.cStyleComment ])
self.comment_grammars.ignore(PYP.quotedString)
self.comment_grammars.setParseAction(lambda x: '')
self.comment_grammars.addParseAction(lambda x: '')
self.comment_in_progress = '/*' + PYP.SkipTo(PYP.stringEnd ^ '*/')
# QuickRef: Pyparsing Operators
# ----------------------------
# ~ creates NotAny using the expression after the operator
#
# + creates And using the expressions before and after the operator
#
# | creates MatchFirst (first left-to-right match) using the
# expressions before and after the operator
#
# ^ creates Or (longest match) using the expressions before and
# after the operator
#
# & creates Each using the expressions before and after the operator
#
# * creates And by multiplying the expression by the integer operand;
# if expression is multiplied by a 2-tuple, creates an And of
# (min,max) expressions (similar to "{min,max}" form in
# regular expressions); if min is None, intepret as (0,max);
# if max is None, interpret as expr*min + ZeroOrMore(expr)
#
# - like + but with no backup and retry of alternatives
#
# * repetition of expression
#
# == matching expression to string; returns True if the string
# matches the given expression
#
# << inserts the expression following the operator as the body of the
# Forward expression before the operator
# ----------------------------
DO_NOT_PARSE = self.comment_grammars | \
self.comment_in_progress | \
PYP.quotedString
# moved here from class-level variable
self.URLRE = re.compile('(https?://[-\\w\\./]+)')
self.keywords = self.reserved_words + [fname[3:] for fname in dir( self.Cmd_object ) if fname.startswith('do_')]
# not to be confused with `multiln_parser` (below)
self.multiln_command = PYP.Or([
PYP.Keyword(c, caseless=self.case_insensitive)
for c in self.multiln_commands
])('multiline_command')
ONELN_COMMAND = ( ~self.multiln_command +
PYP.Word(self.legal_chars)
)('command')
#self.multiln_command.setDebug(True)
# Configure according to `allow_blank_lines` setting
if self._allow_blank_lines:
self.blankln_termination_parser = PYP.NoMatch
else:
BLANKLN_TERMINATOR = (2 * PYP.lineEnd)('terminator')
#BLANKLN_TERMINATOR('terminator')
self.blankln_termination_parser = (
(self.multiln_command ^ ONELN_COMMAND)
+ PYP.SkipTo(
BLANKLN_TERMINATOR,
ignore=DO_NOT_PARSE
).setParseAction(lambda x: x[0].strip())('args')
+ BLANKLN_TERMINATOR
)('statement')
# CASE SENSITIVITY for
# ONELN_COMMAND and self.multiln_command
if self.case_insensitive:
# Set parsers to account for case insensitivity (if appropriate)
self.multiln_command.setParseAction(lambda x: x[0].lower())
ONELN_COMMAND.setParseAction(lambda x: x[0].lower())
self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)^'*')('idx')
+ PYP.Optional(PYP.Word(self.legal_chars + '/\\'))('fname')
+ PYP.stringEnd)
AFTER_ELEMENTS = PYP.Optional(PIPE +
PYP.SkipTo(
OUTPUT_PARSER ^ STRING_END,
ignore=DO_NOT_PARSE
)('pipeTo')
) + \
PYP.Optional(OUTPUT_PARSER +
PYP.SkipTo(
STRING_END,
ignore=DO_NOT_PARSE
).setParseAction(lambda x: x[0].strip())('outputTo')
)
self.multiln_parser = (((self.multiln_command ^ ONELN_COMMAND)
+ PYP.SkipTo(
TERMINATOR_PARSER,
ignore=DO_NOT_PARSE
).setParseAction(lambda x: x[0].strip())('args')
+ TERMINATOR_PARSER)('statement')
+ PYP.SkipTo(
OUTPUT_PARSER ^ PIPE ^ STRING_END,
ignore=DO_NOT_PARSE
).setParseAction(lambda x: x[0].strip())('suffix')
+ AFTER_ELEMENTS
)
#self.multiln_parser.setDebug(True)
self.multiln_parser.ignore(self.comment_in_progress)
self.singleln_parser = (
( ONELN_COMMAND + PYP.SkipTo(
TERMINATOR_PARSER
^ STRING_END
^ PIPE
^ OUTPUT_PARSER,
ignore=DO_NOT_PARSE
).setParseAction(lambda x:x[0].strip())('args'))('statement')
+ PYP.Optional(TERMINATOR_PARSER)
+ AFTER_ELEMENTS)
#self.multiln_parser = self.multiln_parser('multiln_parser')
#self.singleln_parser = self.singleln_parser('singleln_parser')
self.prefix_parser = PYP.Empty()
self.parser = self.prefix_parser + (STRING_END |
self.multiln_parser |
self.singleln_parser |
self.blankln_termination_parser |
self.multiln_command +
PYP.SkipTo(
STRING_END,
ignore=DO_NOT_PARSE)
)
self.parser.ignore(self.comment_grammars)
# a not-entirely-satisfactory way of distinguishing
# '<' as in "import from" from
# '<' as in "lesser than"
self.input_parser = INPUT_MARK + \
PYP.Optional(INPUT_FROM) + \
PYP.Optional('>') + \
PYP.Optional(FILENAME) + \
(PYP.stringEnd | '|')
self.input_parser.ignore(self.comment_in_progress)
I suspect that the problem is pyparsing's builtin whitespace skipping, which will skip over newlines by default. Even though setDefaultWhitespaceChars is used to tell pyparsing that newlines are significant, this setting only affects all expressions that are created after the call to setDefaultWhitespaceChars. The problem is that pyparsing tries to help by defining a number of convenience expressions when it is imported, like empty for Empty(), lineEnd for LineEnd() and so on. But since these are all created at import time, they are defined with the original default whitespace characters, which include '\n'.
I should probably just do this in setDefaultWhitespaceChars, but you can clean this up for yourself too. Right after calling setDefaultWhitespaceChars, redefine these module-level expressions in pyparsing:
PYP.ParserElement.setDefaultWhitespaceChars(' \t')
# redefine module-level constants to use new default whitespace chars
PYP.empty = PYP.Empty()
PYP.lineEnd = PYP.LineEnd()
PYP.stringEnd = PYP.StringEnd()
I think this will help restore the significance of your embedded newlines.
Some other bits on your parser code:
self.blankln_termination_parser = PYP.NoMatch
should be
self.blankln_termination_parser = PYP.NoMatch()
Your original author might have been overly aggressive with using '^' over '|'. Only use '^' if there is some potential for parsing one expression accidentally when you would really have parsed a longer one that follows later in the list of alternatives. For instance, in:
self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)^'*')('idx')
There is no possible confusion between a Word of numeric digits or a lone '*'. Or (or '^' operator) tells pyparsing to try to evaluate all of the alternatives, and then pick the longest matching one - in case of a tie, chose the left-most alternative in the list. If you parse '*', there is no need to see if that might also match a longer integer, or if you parse an integer, no need to see if it might also pass as a lone '*'. So change this to:
self.save_parser = ( PYP.Optional(PYP.Word(PYP.nums)|'*')('idx')
Using a parse action to replace a string with '' is more simply written using a PYP.Suppress wrapper, or if you prefer, call expr.suppress() which returns Suppress(expr). Combined with preference for '|' over '^', this:
self.comment_grammars = PYP.Or([ PYP.pythonStyleComment,
PYP.cStyleComment ])
self.comment_grammars.ignore(PYP.quotedString)
self.comment_grammars.setParseAction(lambda x: '')
becomse:
self.comment_grammars = (PYP.pythonStyleComment | PYP.cStyleComment
).ignore(PYP.quotedString).suppress()
Keywords have built-in logic to automatically avoid ambiguity, so Or is completely unnecessary with them:
self.multiln_command = PYP.Or([
PYP.Keyword(c, caseless=self.case_insensitive)
for c in self.multiln_commands
])('multiline_command')
should be:
self.multiln_command = PYP.MatchFirst([
PYP.Keyword(c, caseless=self.case_insensitive)
for c in self.multiln_commands
])('multiline_command')
(In the next release, I'll loosen up those initializers to accept generator expressions so that the []'s will become unnecessary.)
That's all I can see for now. Hope this helps.
I fixed it!
Pyparsing was not at fault!
I was. ☹
By separating out the parsing code into a different object, I created the problem. Originally, an attribute used to “update itself” based on the contents of a second attribute. Since this all used to be contained in one “god class”, it worked fine.
Simply by separating the code into another object, the first attribute was set at instantiation, but no longer “updated itself” if the second attribute it depended on changed.
Specifics
The attribute multiln_command (not to be confused with multiln_commands—aargh, what confusing naming!) was a pyparsing grammar definition. The multiln_command attribute should have updated its grammar if multiln_commands ever changed.
Although I knew these two attributes had similar names but very different purposes, the similarity definitely made it harder to track the problem down. I have no renamed multiln_command to multiln_grammar.
However! ☺
I am grateful to #Paul McGuire’s awesome answer, and I hope it saves me (and others) some grief in the future. Although I feel a bit foolish that I caused the problem (and misdiagnosed it as a pyparsing issue), I’m happy some good (in the form of Paul’s advice) came of asking this question.
Happy parsing, everybody. :)

Categories