Generate parser for Python3 in python, using ANTLR 4.6

Generate parser for Python3 in python, using ANTLR 4.6 - python

I'm using the ANTLRv4 Python3 grammar from here:
https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4
and running:
java -jar antlr-4.6-complete.jar -Dlanguage=Python2 Python3.g4
to generate Python3Lexer.py + some other files.
However, Python3Lexer.py contains code which is not python! For eg.
def __init__(self, input=None):
super(Python3Lexer, self).__init__(input)
self.checkVersion("4.6")
self._interp = LexerATNSimulator(self, self.atn, self.decisionsToDFA, PredictionContextCache())
self._actions = None
self._predicates = None
// A queue where extra tokens are pushed on (see the NEWLINE lexer rule).
private java.util.LinkedList<Token> tokens = new java.util.LinkedList<>();
// The stack that keeps track of the indentation level.
private java.util.Stack<Integer> indents = new java.util.Stack<>();
Its unusable because of this. Does anyone know why this is happening and how I can fix it? Thanks!

This grammar is full of action code written in Java to deal with specialities of Python. You have to port that manually to python to make the grammar usuable for you. This is why grammar writers are encouraged to move out action code into base classes or listener code.

Related

Nim equivalent to python's `help()`

Does nim compile-in docstrings, so we can echo them at runtime?
Something like:
>>> echo help(echo)
"Writes and flushes the parameters to the standard output.[...]"

edited: there is a way to implement a help functionality
It turns out that using macros.getImpl it is possible to implement a functionality similar to the echo you report above (playground):
import macros
macro implToStr*(ident: typed): string =
toStrLit(getImpl(ident))
template help*(ident: typed) =
echo implToStr(ident)
help(echo)
output:
proc echo(x: varargs[typed, `$`]) {.magic: "Echo", tags: [WriteIOEffect],
gcsafe, locks: 0, sideEffect.}
## Writes and flushes the parameters to the standard output.
##
## Special built-in that takes a variable number of arguments. Each argument
## is converted to a string via ``$``, so it works for user-defined
## types that have an overloaded ``$`` operator.
## It is roughly equivalent to ``writeLine(stdout, x); flushFile(stdout)``, but
## available for the JavaScript target too.
##
## Unlike other IO operations this is guaranteed to be thread-safe as
## ``echo`` is very often used for debugging convenience. If you want to use
## ``echo`` inside a `proc without side effects
## <manual.html#pragmas-nosideeffect-pragma>`_ you can use `debugEcho
## <#debugEcho,varargs[typed,]>`_ instead.
the output is a bit wonky (a very large indent after first line of docstring) and it has some limitations: it will not work on some symbols - e.g. it works on toStrLit but not on getImpl; it will not work on overload. Those limitations might be improved in the future either with a better implementation or with fixes to compiler/stdlib.
previous answer
No, Nim will not compile docstrings (edit: docstrings are not available at runtime, but they are part of the AST and they can be accessed at compile time, see above). With a supported editor you can get help hovering over identifiers or going to definition in source code.
For example in VS Code (with Nim extension), hovering over echo will give you:
And pressing F12 (on Windows) will go to definition of echo in systems.nim.
Another useful resource for standard library identifier is the searchable index.

How to use IP_FILTER with python libtorrent

The question I have is: How can I use ip_filter in libtorrent using the python language.
The goal I am trying to achieve is: Block all IP-addresses (in or out going traffic) using libtorrent ip-filter except for the one’s I allow. Code snippet below is where I try to achieve my goal...
class Session:
def __init__(self)
self.session = libtorrent.session({'listen_interfaces': '0.0.0.0:6881'})
self.ip_filter = None
….more….
def set_access_rules(self):
self.ip_filter = libtorrent.ip_filter()
self.ip_filter.add_rule('0.0.0.0', '255.255.255.255', 1) # I assume ‘1’ means blocking
self.ip_filter.add_rule('172.16.100.36', '172.16.100.36', 0) # I assume ‘0’ allow, prob. wrong...
self.session.set_ip_filter(self.ip_filter)
The (c source) documentation said:
// Adds a rule to the filter. first and last defines a range of
// ip addresses that will be marked with the given flags. The flags
// can currently be 0, which means allowed, or ip_filter::blocked, which
// means disallowed.
ip_filter::blocked <- This is where I get stuck, how do I use/write that in python?
The thing is that if I call ‘handle.get_peer_info()’ I expect only to see 172.16.100.36 but I see all sorts of public addresses… Note: My torrent has no trackers and I configured no trackers elsewhere. Can you maybe give me an example in python how to achieve my goal?

Can you use mock_open to simulate serial connections?

Morning folks,
I'm trying to get a few unit tests going in Python to confirm my code is working, but I'm having a real hard time getting a Mock anything to fit into my test cases. I'm new to Python unit testing, so this has been a trying week thus far.
The summary of the program is I'm attempting to do serial control of a commercial monitor I got my hands on and I thought I'd use it as a chance to finally use Python for something rather than just falling back on one of the other languages I know. I've got pyserial going, but before I start shoving a ton of commands out to the TV I'd like to learn the unittest part so I can write for my expected outputs and inputs.
I've tried using a library called dummyserial, but it didn't seem to be recognising the output I was sending. I thought I'd give mock_open a try as I've seen it works like a standard IO as well, but it just isn't picking up on the calls either. Samples of the code involved:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8')
read_text = 'Stuff\r'
mo = mock_open(read_data=read_text)
mo.in_waiting = len(read_text)
with patch('__main__.open', mo):
with open('./serial', 'a+b') as com:
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
com.write(b'some junk')
print(mo.mock_calls)
mo().write.assert_called_with('{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8'))
And in the SharpTV class, the function in question:
def sendCmd(self, type, msg):
sent = self.com.write('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
print('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
Obviously, I'm attempting to control a Sharp TV. I know the commands are correct, that isn't the issue. The issue is just the testing. According to documentation on the mock_open page, calling mo.mock_calls should return some data that a call was made, but I'm getting just an empty set of []'s even in spite of the blatantly wrong com.write(b'some junk'), and mo().write.assert_called_with(...) is returning with an assert error because it isn't detecting the write from within sendCmd. What's really bothering me is I can do the examples from the mock_open section in interactive mode and it works as expected.
I'm missing something, I just don't know what. I'd like help getting either dummyserial working, or mock_open.

To answer one part of my question, I figured out the functionality of dummyserial. The following works now:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK'])
com = dummyserial.Serial(
port='COM1',
baudrate=9600,
ds_responses={powerCheck : powerCheck}
)
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
self.assertEqual(tv.recv(), powerCheck)
Previously I was encoding the dictionary values as utf-8. The dummyserial library decodes whatever you write(...) to it so it's a straight string vs. string comparison. It also encodes whatever you're read()ing as latin1 on the way back out.

ANTLR4 and the Python target

I'm having issues getting going with a Python target in ANTLR4. There seems to be very few examples available and going to the corresponding Java code doesn't seem relevant.
I'm using the standard Hello.g4 grammar:
// Define a grammar called Hello
grammar Hello;
r : 'hello' ID ; // match keyword hello followed by an identifier
ID : [a-z]+ ; // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
The example (built from the standard Hello.g4 example):
input_ = antlr4.FileStream(_FILENAME)
lexer = HelloLexer.HelloLexer(input_)
stream = antlr4.CommonTokenStream(lexer)
parser = HelloParser.HelloParser(stream)
rule_name = 'r'
tree = getattr(parser, rule_name)()
I also wrote a listener. To assert/verify that this is correct, I'll repeat it here:
class HelloListener(antlr4.ParseTreeListener):
def enterR(self, ctx):
print("enterR")
def exitR(self, ctx):
print("exitR")
def enterId(self, ctx):
print("enterId")
def exitId(self, ctx):
print("exitId")
So, first, I can't guarantee that the string I'm giving it is valid because I'm not getting any screen output. How do I tell from the tree object if anything was matched? How do I extract the matching rules/tokens?
A Python example would be great, if possible.

I hear you, having the same issues right now. The Python documentation for v4 is useless and v3 differs to much to be usable. I'm thinking about switching back to Java to implement my stuff.
Regarding your code: I think your own custom listener has to inherit from the generated HelloListener. You can do the printing there.
Also try parsing invalid input to see if the parser starts at all. I'm not sure about the line with getattr(parser, rule_name)() though. I followed the steps in the (unfortunately very short) documentation for the Antlr4 Python target: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Python+Target
You can also find some documentation about the listener stuff there. Hope it helps.

This question seems to be old, but I also had the same problems and found out how to deal with it. When using strings in python, you have to use the function antlr4.InputStream as pointed out here
So, in the end, you could get a working example with this sort of code (based on Alan's answer and an example from dzone)
from antlr4 import *
from grammar.HelloListener import HelloListener
from grammar.HelloLexer import HelloLexer
from grammar.HelloParser import HelloParser
import sys
class HelloPrintListener(HelloListener):
def enterHi(self, ctx):
print("Hello: %s" % ctx.ID())
def main():
giveMeInput = input ("say hello XXXX\n")
print("giveMeInput is {0}".format(giveMeInput))
# https://www.programcreek.com/python/example/93166/antlr4.InputStream
# https://groups.google.com/forum/#!msg/antlr-discussion/-9VJ5H9NcDs/OukVNCTQCAAJ
i_stream = InputStream(giveMeInput)
lexer = HelloLexer(i_stream)
t_stream = CommonTokenStream(lexer)
parser = HelloParser(t_stream)
tree = parser.hi()
printer = HelloPrintListener()
walker = ParseTreeWalker()
walker.walk(printer, tree)
if __name__ == '__main__':
main()

I've created an example for Python 2 using the Hello grammar.
Here's the relevant code:
from antlr4 import *
from HelloLexer import HelloLexer
from HelloListener import HelloListener
from HelloParser import HelloParser
import sys
class HelloPrintListener(HelloListener):
def enterHi(self, ctx):
print("Hello: %s" % ctx.ID())
def main():
lexer = HelloLexer(StdinStream())
stream = CommonTokenStream(lexer)
parser = HelloParser(stream)
tree = parser.hi()
printer = HelloPrintListener()
walker = ParseTreeWalker()
walker.walk(printer, tree)
if __name__ == '__main__':
main()
As fabs said, the key is to inherit from the generated HelloListener. There seems to be some pickiness on this issue, as you can see if you modify my HelloPrintListener to inherit directly from ANTLR's ParseTreeListener. I imagined that would work since the generated HelloListener just has empty methods, but I saw the same behavior you saw (listener methods never being called).
Even though the documentation for Python listeners are lacking, the available methods are similar to Java.

The antlr documentation has been updated to document the support for python 3 and python 4 targets. The examples from the antlr book converted to python3 can be found here, they are sufficient enough to get anyone started.

How to debug Python memory fault?

Edit: Really appreciate help in finding bug - but since it might prove hard to find/reproduce, any general debug help would be greatly appreciated too! Help me help myself! =)
Edit 2: Narrowing it down, commenting out code.
Edit 3: Seems lxml might not be the culprit, thanks! The full script is here. I need to go over it looking for references. What do they look like?
Edit 4: Actually, the scripts stops (goes 100%) in this, the parse_og part of it. So edit 3 is false - it must be lxml somehow.
Edit 5 MAJOR EDIT: As suggested by David Robinson and TankorSmash below, I've found a type of data content that will send lxml.etree.HTML( data ) in a wild loop. (I carelessly disregarded it, but find my sins redeemed as I've paid a price to the tune of an extra two days of debug! ;) A working crashing script is here. (Also opened a new question.)
Edit 6: Turns out this is a bug with lxml version 2.7.8 and below (at
least). Updated to lxml 2.9.0, and bug is gone. Thanks also to the fine folks over at this follow-up question.
I don't know how to debug this weird problem I'm having.
The below code runs fine for about five minutes, when the RAM is suddenly completely filled up (from 200MB to 1700MB during the 100% period - then when memory is full, it goes into blue wait state).
It's due to the code below, specifically the first two lines. That's for sure. But what is going on? What could possibly explain this behaviour?
def parse_og(self, data):
""" lxml parsing to the bone! """
try:
tree = etree.HTML( data ) # << break occurs on this line >>
m = tree.xpath("//meta[#property]")
#for i in m:
# y = i.attrib['property']
# x = i.attrib['content']
# # self.rj[y] = x # commented out in this example because code fails anyway
tree = ''
m = ''
x = ''
y = ''
i = ''
del tree
del m
del x
del y
del i
except Exception:
print 'lxml error: ', sys.exc_info()[1:3]
print len(data)
pass

You can try Low-level Python debugging with GDB. Probably there is a bug in Python interpreter or in lxml library and it is hard to find it without extra tools.
You can interrupt your script running under gdb when CPU usage goes to 100% and look at stack trace. It will probably help to understand what's going on inside script.

it must be due to some references which keep the documents alive. one must always be careful with string results from xpath evaluation. I see you have assigned None to tree and m but not to y,x and i .
Can you also assign None to y,x and i .

Tools are also helpful when trying to track down memory problems. I've found guppy to be a very useful Python memory profiling and exploration tool.
It is not the easiest to get started with due to a lack of good tutorials / documentation, but once you get to grips with it you will find it very useful. Features I make use of:
Remote memory profiling (via sockets)
Basic GUI for charting usage, optionally showing live data
Powerful, and consistent, interfaces for exploring data usage in a Python shell

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate parser for Python3 in python, using ANTLR 4.6 - python

This grammar is full of action code written in Java to deal with specialities of Python. You have to port that manually to python to make the grammar usuable for you. This is why grammar writers are encouraged to move out action code into base classes or listener code.

Related

Nim equivalent to python's `help()`

How to use IP_FILTER with python libtorrent

Can you use mock_open to simulate serial connections?

ANTLR4 and the Python target

How to debug Python memory fault?

Categories

Resources