In LR parsing, is it possible to construct a non-binary AST? - python

I am currently trying to build a parser for propositional logic using the Python SLY module. SLY is a Python implementation of lex and yacc.
https://sly.readthedocs.io/en/latest/sly.html#introduction
The documentation says, "SLY provides no special functions for constructing an abstract syntax tree. However, such construction is easy enough to do on your own." This is what I am trying to do. In their example code, they recommend doing this by defining your own data structure for tree nodes, and using it in the grammar rules.
class BinOp(Expr):
def __init__(self, op, left, right)
self.op = op
self.left = left
self.right = right
class Number(Expr):
def __init__(self, value):
self.value = value
#_('expr PLUS expr',
'expr MINUS expr',
'expr TIMES expr',
'expr DIVIDE expr')
def expr(self, p):
return BinOp(p[1], p.expr0, p.expr1)
#_('LPAREN expr RPAREN')
def expr(self, p):
return p.expr
My problem is that for my application of parsing propositional logic, although this way of parsing would correctly check syntax and represent the meaning of the logic expression parsed, the parser would construct the AST as a binary tree. Hence, if I were to let it parse the following two expressions:
pvqvr
pv(qvr)
The resulting ASTs would look the same (with right associativity).
For a different part of my project, it is important for me to treat conjunction and disjunction operations as n-ary rather than binary. Taking the first expression above as an example, the disjunction operation is being applied to the three operands p, q, and r simultaneously. I will need to be able to distinguish between the two example expressions above by just looking at the AST itself. The following diagrams show the difference I am going after
v v
/ | \ / \
p q r p v
/ \
q r
Is it theoretically possible with LR parsing to create ASTs with nodes that have more than two children? If so, is the SLY framework robust enough for me to be able to do this, or do I need to create my own parser? If LR parsing is incapable of creating such a tree, are there other algorithms I should consider? I am not doing any further compiling after creating the tree, I just need to form trees that represent propositional logic expressions as indicated above.
Apologies in advance if it's a stupid question, I just took Programming Languages and Translators in the Spring 2020 semester, and with everything that's been going on in the world, the learning experience was rather disruptive. I would greatly appreciate any advice. Thanks so much!

Certainly you can do it. It's just playing around with data structures, after all. However, it's tricky (though certainly not impossible) to cover all the cases while you're parsing, so it may be easier (and more efficient) to transform the tree after the parse is complete.
The key problem is that when you are parsing expr OR expr, it is possible that either or both expr non-terminals are already OR nodes, whose lists need to be combined. So you might start with something like this:
class BinOp(Expr):
def __init__(self, op, left, right)
if left.op == op:
left_ops = left.operands
else:
left_ops = (left,)
if right.op == op:
right_ops = right.operands
else:
right_ops = (right,)
self.op = op
self.operands = left_ops + right_ops
#_('expr OR expr',
'expr AND expr')
def expr(self, p):
return BinOp(p[1], p.expr0, p.expr1)
That will work. But here's my suspicion (because it's happened to me, over and over again with different variations): at some point you'll want to apply deMorgan's laws (perhaps not consistently, but in some cases), so you'll end up turning some negated conjunction nodes into disjunctions and/or negated disjunction nodes in conjunctions. And after you do that, you'll want to compress the new disjunction (or conjunction nodes) again, because otherwise your newly created nodes may violate the constraint that the operands of a conjunction/disjuntion operator cannot be conjunctions/disjunctions (respectively). And as you crawl through the tree applying deMorgan, you might end up doing various flips which require more compression passes...
So my hunch is that you'll find yourself with less repetitive code and a clearer control flow if you first parse (which often naturally produces binary trees) and then do the various transformations you in an appropriate order.
Nonetheless, there are certainly grammars which naturally produce multivalent nodes rather than binary nodes; the classic one is argument lists, but any list structure will have the same effect. Here, the list is (probably) not the result of flattening parenthetic subexpressions, though. It simply responds to a grammar such as:
#_('expr')
def exprlist(self, p):
return [p.expr]
#_('exprlist "," expr')
def exprlist(self, p):
p.exprlist.append(p.expr)
return p.exprlist
#_('ID "(" exprlist ")" ')
def expr(self, p):
return ('call', p.ID, p.exprlist)
# Or, if you want a truly multivalent node:
# return ('call', p.ID) + tuple(p.exprlist)
SLY can do that sort of thing automatically if you give it EBNF productions, so that might only be slightly interesting.

Related

How to represent combinational circuits in code

I'm writing a python program that does some operations on combinational circuits like comparing for equality to other circuits, merging gates, counting gates, counting connections, finding fanout gates,...
Right now im representing the combinational circuits in the following way:
(I also added the testing for equality)
class Circuit:
def __init__(self):
self.gates = {} # key = the gates number, value = the gate
def __eq__(self, other):
if set(self.gates.keys()) != set(other.gates.keys()):
return False
for key in self.gates.keys():
if self.gates[key] != other.gates[key]:
return False
return True
class Gate:
def __init__(self, gate_type, number):
self.gate_type = gate_type # and, or, nand, nor, xor, xnor
self.number = number
self.incoming_gates = []
self.outgoing_gates = []
def __eq__(self, other):
# i know this is not correct, but in my case correct enough
return (
self.gate_type == other.gate_type
and self.number == other.number
and len(self.incoming) == len(other.incoming)
and len(self.outgoing) == len(other.outgoing)
)
My representation in code seems very laborious to me, so I am looking for a better way to do this. I have searched for best practices on this but didn't find anything.
You're looking to implement a directed graph, with certain data stored in vertices. Wikipedia has a discussion of various ways to represent a graph and here's a stackoverflow talking about the more general problem.
For quickly modifying the topology of the graph, and for doing (merging gates, etc) an adjacency list like you have is often useful.
In general I think the test of an architecture is when you actually start to implement it--I'd suspect you'll become very familiar with the benefits and detriments of your design quickly once you get started using it, and be able to adjust or build helper functions as needed.
You could avoid redundancy in the Gate class by only storing the inbound gate references but that would make the rest of your code more complex to implement. I believe the tradeoff of redundancy vs ease of use should weigh in favour of ease of use.
I don't know how you implement the connections between the gates, but if you hold object references in self.incoming_gates / self.outgoing_gates, you can probably define them based only on incoming links and update the source's outgoing_gate list with self automatically (possibly in the constructor itself)

How can I improve the runtime of Python implementation of cycle detection in Course Schedule problem?

My aim is to improve the speed of my Python code that has been successfully accepted in a leetcode problem, Course Schedule.
I am aware of the algorithm but even though I am using O(1) data-structures, my runtime is still poor: around 200ms.
My code uses dictionaries and sets:
from collections import defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
course_list = []
pre_req_mapping = defaultdict(list)
visited = set()
stack = set()
def dfs(course):
if course in stack:
return False
stack.add(course)
visited.add(course)
for neighbor in pre_req_mapping.get(course, []):
if neighbor in visited:
no_cycle = dfs(neighbor)
if not no_cycle:
return False
stack.remove(course)
return True
# for course in range(numCourses):
# course_list.append(course)
for pair in prerequisites:
pre_req_mapping[pair[1]].append(pair[0])
for course in range(numCourses):
if course in visited:
continue
no_cycle = dfs(course)
if not no_cycle:
return False
return True
What else can I do to improve the speed?
You are calling dfs() for a given course multiple times.
But its return value won't change.
So we have an opportunity to memoize it.
Change your algorithmic approach (here, to dynamic programming)
for the big win.
It's a space vs time tradeoff.
EDIT:
Hmmm, you are already memoizing most of the computation
with visited, so lru_cache would mostly improve clarity
rather than runtime.
It's just a familiar idiom for caching a result.
It would be helpful to add a # comment citing a reference
for the algorithm you implemented.
This is a very nice expression, with defaulting:
pre_req_mapping.get(course, [])
If you use timeit you may find that the generated bytecode
for an empty tuple () is a tiny bit more efficient than that
for an empty list [], as it involves fewer allocations.
Ok, some style nits follow, unrelated to runtime.
As an aside, youAreMixingCamelCase and_snake_case.
PEP-8 asks you to please stick with just snake_case.
This is a fine choice of identifier name:
for pair in prerequisites:
But instead of the cryptic [0], [1] dereferences,
it would be easier to read a tuple unpack:
for course, prereq in prerequisites:
if not no_cycle: is clumsy.
Consider inverting the meaning of dfs' return value,
or rephrasing the assignment as:
cycle = not dfs(course)
I think that you are doing it in good way, but since Python is an interpreted language, it's normal to have slow runtime compared with compiled languages like C/C++ and Java, especially for large inputs.
Try to write the same code in C/C++ for example and compare the speed between them.

xml.etree.ElementTree getElementByID()?

How to get the equivalent of getElementByID() with the Python library xml.etree.ElementTree?
There seems to be a method called parseid() but my tree is already parsed. I don't want to parse it again.
I found it myself:
tree.findall('''.//*[#id='fooID']''')[0]
Better or other solutions are still welcome. :-)
The accepted answer works indeed, but performance can be quite abysmal as - my guess is, I didn't verify this, perhaps also related to the complexity of xpath - the tree is traversed on every to findall(), which may or may not be a concern for your use case.
Probably parseid() is indeed what you want if performance is a concern. If you want to obtain such an id mapping on an existing tree, you can also easily perform the traversal once manually.
class getElementById():
def __init__(self, tree):
self.di = {}
def v(node):
i = node.attrib.get("id")
if i is not None:
self.di[i] = node
for child in node:
v(child)
v(tree.getroot())
def __call__(self, k):
return self.di[k]

Recursively operating on a tree structure: How do I get the state of the "entire" tree?

First, context:
As a side project, I'm building a computer algebra system in Python that yields the steps it takes to solve an equation.
So far, I've been able to parse algebraic expressions and equations into an expression tree. It's structured something like this (not the actual code—may not be running):
# Other operators and math functions are based off this.
# Numbers and symbols also have their own classes with 'parent' attributes.
class Operator(object):
def __init__(self, *args):
self.children = args
for child in self.children:
child.parent = self
# the parser does something like this:
expr = Add(1, Mult(3, 4), 5)
On top of this, I have a series of functions that operate recursively to simplify expressions. They're not purely functional, but I'm trying to avoid relying on mutability for operations, instead returning a modified copy of the node I'm working with. Each function looks something like this:
def simplify(node):
for index, child in enumerate(node.children):
if isinstance(child, Operator):
node.children[index] = simplify(node)
else:
# perform some operations to simplify numbers and symbols
pass
return node
The challenge comes in the "step by step" part. I'd like for my "simplification" functions to all be nested generators that "yield" the steps it takes to solve something. So basically, every time each function performs an operation, I'd like to be able to do something like this: yield (deepcopy(node), expression, "Combined like terms.") so that whatever is relying on this library can output something like:
5x + 3*4x + 3
5x + 12x + 3 Simplified product 3*4x into 12x
17x + 3 Combined like terms 5x + 12x = 17x
However, each function only has knowledge about the node it's operating on, but has no idea what the overall expression looks like.
So this is my question: What would be the best way of maintaining the "state" of the entire expression tree so that each "step" has knowledge of the entire expression?
Here are the solutions I've come up with:
Do every operation in place and either use a global variable or an instance variable in a class to store a pointer to the equation. I don't like this because unit testing is tougher, since now I have to set up the class first. You also lose other advantages of a more functional approach.
Pass through the root of the expression to every function. However, this either means I have to repeat every operation to also update the expression or that I have to rely on mutability.
Have the top level function 'reconstruct' the expression tree based on each step I yield. For example, if I yield 5x + 4x = 9x, have the top level function find the (5x + 4x) node and replace it with '9x'. This seems like the best solution, but how best to 'reconstruct' each step?
Two final, related questions: Does any of this make sense? I have a lot of caffeine in my system right now and have no idea if I'm being clear.
Am I worrying too much about mutability? Is this a case of premature optimization?
You might be asking about tree zippers. Check: Functional Pearl: Weaving a Web and see if it applies to what you want. From reading your question, I think you're asking to do recursion on a tree structure, but be able to navigate back to the top as necessary. Zippers act as a "breadcrumb" to let you get back to the ancestors of the tree.
I have an implementation of one in JavaScript.
Are you using Polish notation to construct the tree?
For the step by step simplification you can just use a loop until no modifications (operations) can be made in the tree.

Example for ast.NodeTransformer that mutates an equation

This is a continuation of my last question. I want to parse an equation and work on the ast I get. What I want to do is basically randomly scramble it so I get a new equation, that has to be also a valid function. This is to be used in a genetic algorithm.
Here is where I start:
class Py2do(ast.NodeTransformer):
def __init__(self):
self.tree=[]
def generic_visit(self, node):
print type(node).__name__
self.tree.append(type(node).__name__)
ast.NodeVisitor.generic_visit(self, node)
depth=3
s = node.__dict__.items()
s = " ".join("%s %r" % x for x in sorted(node.__dict__.items()))
print( "%s%s\t%s" % (depth, str(type(node)), s) )
for x in ast.iter_child_nodes(node):
print (x, depth)
def visit_Name(self, node):
# print 'Name :', node.id
pass
def visit_Num(self, node):
print 'Num :', node.__dict__['n']
def visit_Str(self, node):
print "Str :", node.s
def visit_Print(self, node):
print "Print :"
ast.NodeVisitor.generic_visit(self, node)
def visit_Assign(self, node):
print "Assign :"
ast.NodeVisitor.generic_visit(self, node)
def visit_Expr(self, node):
print "Expr :"
ast.NodeVisitor.generic_visit(self, node)
if __name__ == '__main__':
node = ast.parse("res= e**(((-0.5*one)*((delta_w*one/delta*one)**2)))")
import ast_pretty
print ast.dump(node)
pprintAst(node)
v = Py2do()
v.visit(node)
print v.tree
What I want to get out is something like this :
res= e**(delta*((one/delta_w*one)**2)))
or another valid random equation of some sort. This will be used in a Fortran program, so it would be nice if the resulting equation can also be transferred into Fortran.
Please comment your code and provide a test sample/unit test.
So the input and the output are Fortran code? And you want to use arbitrary Fortran expressions/statements? (Including array slices, ...?) Fortran is a pretty complex language; reading it requires pretty much a full parser.
Perhaps you want to use an program transformation tool that can already manipulate Fortran directly. Such a tool would read the Fortran code, build an AST, let you "randomize" it using a set of randomly chosen transformations, and then regenerate valid Fortran code.
Our DMS Software Reengineering Toolkit with its Fortran front end could be directly used for this.
EDIT Aug 26 2011: OP confirms he wants to "evolve" (transform) real Fortran code. It is worth noting that building a real Fortran parser (like building parsers for any other real language) is pretty hard; it took us months and our tools are really good at defining parsers (we've done some 40 languages and a variety of dialects using DMS). It is probably not a good idea for him to build his own real Fortran parser, at least not if he wants to get on with his life or his actual task.
It might be possible for OP to constrain the Fortran code to a very restricted subset, and build a parser for that.
What are you trying to do? Looking for the right permutation of an equation might be easy but time consuming (n! possibilities), but generating new ones and optimize those using a genetic algorithm is imho impossible, because it`s not an optimization problem... For example x^0.00 and x^0.01 are fundamental different. Also, you can not optimize for the right operator, that just won't work. Sorry.
Although, the situation isn't that bad. Looking for the right function is an extremely common task. I am now assuming that you do not know the function, but you know a couple of points from measurements (you would have needed that to calculate the fitness in your genetic algorithm anyway, didn't you?). You now can use Lagrange to get a polynomial which passes those given points. There are two good examples in the middle of the wikipedia article, and lagrange is quite easy to implement (<10 lines of code I guess). Also note that you have the ability to improve the accuracy of the polynomial just by adding more reference points.

Categories