Example for ast.NodeTransformer that mutates an equation - python

This is a continuation of my last question. I want to parse an equation and work on the ast I get. What I want to do is basically randomly scramble it so I get a new equation, that has to be also a valid function. This is to be used in a genetic algorithm.
Here is where I start:
class Py2do(ast.NodeTransformer):
def __init__(self):
self.tree=[]
def generic_visit(self, node):
print type(node).__name__
self.tree.append(type(node).__name__)
ast.NodeVisitor.generic_visit(self, node)
depth=3
s = node.__dict__.items()
s = " ".join("%s %r" % x for x in sorted(node.__dict__.items()))
print( "%s%s\t%s" % (depth, str(type(node)), s) )
for x in ast.iter_child_nodes(node):
print (x, depth)
def visit_Name(self, node):
# print 'Name :', node.id
pass
def visit_Num(self, node):
print 'Num :', node.__dict__['n']
def visit_Str(self, node):
print "Str :", node.s
def visit_Print(self, node):
print "Print :"
ast.NodeVisitor.generic_visit(self, node)
def visit_Assign(self, node):
print "Assign :"
ast.NodeVisitor.generic_visit(self, node)
def visit_Expr(self, node):
print "Expr :"
ast.NodeVisitor.generic_visit(self, node)
if __name__ == '__main__':
node = ast.parse("res= e**(((-0.5*one)*((delta_w*one/delta*one)**2)))")
import ast_pretty
print ast.dump(node)
pprintAst(node)
v = Py2do()
v.visit(node)
print v.tree
What I want to get out is something like this :
res= e**(delta*((one/delta_w*one)**2)))
or another valid random equation of some sort. This will be used in a Fortran program, so it would be nice if the resulting equation can also be transferred into Fortran.
Please comment your code and provide a test sample/unit test.

So the input and the output are Fortran code? And you want to use arbitrary Fortran expressions/statements? (Including array slices, ...?) Fortran is a pretty complex language; reading it requires pretty much a full parser.
Perhaps you want to use an program transformation tool that can already manipulate Fortran directly. Such a tool would read the Fortran code, build an AST, let you "randomize" it using a set of randomly chosen transformations, and then regenerate valid Fortran code.
Our DMS Software Reengineering Toolkit with its Fortran front end could be directly used for this.
EDIT Aug 26 2011: OP confirms he wants to "evolve" (transform) real Fortran code. It is worth noting that building a real Fortran parser (like building parsers for any other real language) is pretty hard; it took us months and our tools are really good at defining parsers (we've done some 40 languages and a variety of dialects using DMS). It is probably not a good idea for him to build his own real Fortran parser, at least not if he wants to get on with his life or his actual task.
It might be possible for OP to constrain the Fortran code to a very restricted subset, and build a parser for that.

What are you trying to do? Looking for the right permutation of an equation might be easy but time consuming (n! possibilities), but generating new ones and optimize those using a genetic algorithm is imho impossible, because it`s not an optimization problem... For example x^0.00 and x^0.01 are fundamental different. Also, you can not optimize for the right operator, that just won't work. Sorry.
Although, the situation isn't that bad. Looking for the right function is an extremely common task. I am now assuming that you do not know the function, but you know a couple of points from measurements (you would have needed that to calculate the fitness in your genetic algorithm anyway, didn't you?). You now can use Lagrange to get a polynomial which passes those given points. There are two good examples in the middle of the wikipedia article, and lagrange is quite easy to implement (<10 lines of code I guess). Also note that you have the ability to improve the accuracy of the polynomial just by adding more reference points.

Related

In LR parsing, is it possible to construct a non-binary AST?

I am currently trying to build a parser for propositional logic using the Python SLY module. SLY is a Python implementation of lex and yacc.
https://sly.readthedocs.io/en/latest/sly.html#introduction
The documentation says, "SLY provides no special functions for constructing an abstract syntax tree. However, such construction is easy enough to do on your own." This is what I am trying to do. In their example code, they recommend doing this by defining your own data structure for tree nodes, and using it in the grammar rules.
class BinOp(Expr):
def __init__(self, op, left, right)
self.op = op
self.left = left
self.right = right
class Number(Expr):
def __init__(self, value):
self.value = value
#_('expr PLUS expr',
'expr MINUS expr',
'expr TIMES expr',
'expr DIVIDE expr')
def expr(self, p):
return BinOp(p[1], p.expr0, p.expr1)
#_('LPAREN expr RPAREN')
def expr(self, p):
return p.expr
My problem is that for my application of parsing propositional logic, although this way of parsing would correctly check syntax and represent the meaning of the logic expression parsed, the parser would construct the AST as a binary tree. Hence, if I were to let it parse the following two expressions:
pvqvr
pv(qvr)
The resulting ASTs would look the same (with right associativity).
For a different part of my project, it is important for me to treat conjunction and disjunction operations as n-ary rather than binary. Taking the first expression above as an example, the disjunction operation is being applied to the three operands p, q, and r simultaneously. I will need to be able to distinguish between the two example expressions above by just looking at the AST itself. The following diagrams show the difference I am going after
v v
/ | \ / \
p q r p v
/ \
q r
Is it theoretically possible with LR parsing to create ASTs with nodes that have more than two children? If so, is the SLY framework robust enough for me to be able to do this, or do I need to create my own parser? If LR parsing is incapable of creating such a tree, are there other algorithms I should consider? I am not doing any further compiling after creating the tree, I just need to form trees that represent propositional logic expressions as indicated above.
Apologies in advance if it's a stupid question, I just took Programming Languages and Translators in the Spring 2020 semester, and with everything that's been going on in the world, the learning experience was rather disruptive. I would greatly appreciate any advice. Thanks so much!
Certainly you can do it. It's just playing around with data structures, after all. However, it's tricky (though certainly not impossible) to cover all the cases while you're parsing, so it may be easier (and more efficient) to transform the tree after the parse is complete.
The key problem is that when you are parsing expr OR expr, it is possible that either or both expr non-terminals are already OR nodes, whose lists need to be combined. So you might start with something like this:
class BinOp(Expr):
def __init__(self, op, left, right)
if left.op == op:
left_ops = left.operands
else:
left_ops = (left,)
if right.op == op:
right_ops = right.operands
else:
right_ops = (right,)
self.op = op
self.operands = left_ops + right_ops
#_('expr OR expr',
'expr AND expr')
def expr(self, p):
return BinOp(p[1], p.expr0, p.expr1)
That will work. But here's my suspicion (because it's happened to me, over and over again with different variations): at some point you'll want to apply deMorgan's laws (perhaps not consistently, but in some cases), so you'll end up turning some negated conjunction nodes into disjunctions and/or negated disjunction nodes in conjunctions. And after you do that, you'll want to compress the new disjunction (or conjunction nodes) again, because otherwise your newly created nodes may violate the constraint that the operands of a conjunction/disjuntion operator cannot be conjunctions/disjunctions (respectively). And as you crawl through the tree applying deMorgan, you might end up doing various flips which require more compression passes...
So my hunch is that you'll find yourself with less repetitive code and a clearer control flow if you first parse (which often naturally produces binary trees) and then do the various transformations you in an appropriate order.
Nonetheless, there are certainly grammars which naturally produce multivalent nodes rather than binary nodes; the classic one is argument lists, but any list structure will have the same effect. Here, the list is (probably) not the result of flattening parenthetic subexpressions, though. It simply responds to a grammar such as:
#_('expr')
def exprlist(self, p):
return [p.expr]
#_('exprlist "," expr')
def exprlist(self, p):
p.exprlist.append(p.expr)
return p.exprlist
#_('ID "(" exprlist ")" ')
def expr(self, p):
return ('call', p.ID, p.exprlist)
# Or, if you want a truly multivalent node:
# return ('call', p.ID) + tuple(p.exprlist)
SLY can do that sort of thing automatically if you give it EBNF productions, so that might only be slightly interesting.

How can I improve the runtime of Python implementation of cycle detection in Course Schedule problem?

My aim is to improve the speed of my Python code that has been successfully accepted in a leetcode problem, Course Schedule.
I am aware of the algorithm but even though I am using O(1) data-structures, my runtime is still poor: around 200ms.
My code uses dictionaries and sets:
from collections import defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
course_list = []
pre_req_mapping = defaultdict(list)
visited = set()
stack = set()
def dfs(course):
if course in stack:
return False
stack.add(course)
visited.add(course)
for neighbor in pre_req_mapping.get(course, []):
if neighbor in visited:
no_cycle = dfs(neighbor)
if not no_cycle:
return False
stack.remove(course)
return True
# for course in range(numCourses):
# course_list.append(course)
for pair in prerequisites:
pre_req_mapping[pair[1]].append(pair[0])
for course in range(numCourses):
if course in visited:
continue
no_cycle = dfs(course)
if not no_cycle:
return False
return True
What else can I do to improve the speed?
You are calling dfs() for a given course multiple times.
But its return value won't change.
So we have an opportunity to memoize it.
Change your algorithmic approach (here, to dynamic programming)
for the big win.
It's a space vs time tradeoff.
EDIT:
Hmmm, you are already memoizing most of the computation
with visited, so lru_cache would mostly improve clarity
rather than runtime.
It's just a familiar idiom for caching a result.
It would be helpful to add a # comment citing a reference
for the algorithm you implemented.
This is a very nice expression, with defaulting:
pre_req_mapping.get(course, [])
If you use timeit you may find that the generated bytecode
for an empty tuple () is a tiny bit more efficient than that
for an empty list [], as it involves fewer allocations.
Ok, some style nits follow, unrelated to runtime.
As an aside, youAreMixingCamelCase and_snake_case.
PEP-8 asks you to please stick with just snake_case.
This is a fine choice of identifier name:
for pair in prerequisites:
But instead of the cryptic [0], [1] dereferences,
it would be easier to read a tuple unpack:
for course, prereq in prerequisites:
if not no_cycle: is clumsy.
Consider inverting the meaning of dfs' return value,
or rephrasing the assignment as:
cycle = not dfs(course)
I think that you are doing it in good way, but since Python is an interpreted language, it's normal to have slow runtime compared with compiled languages like C/C++ and Java, especially for large inputs.
Try to write the same code in C/C++ for example and compare the speed between them.

Recursively operating on a tree structure: How do I get the state of the "entire" tree?

First, context:
As a side project, I'm building a computer algebra system in Python that yields the steps it takes to solve an equation.
So far, I've been able to parse algebraic expressions and equations into an expression tree. It's structured something like this (not the actual code—may not be running):
# Other operators and math functions are based off this.
# Numbers and symbols also have their own classes with 'parent' attributes.
class Operator(object):
def __init__(self, *args):
self.children = args
for child in self.children:
child.parent = self
# the parser does something like this:
expr = Add(1, Mult(3, 4), 5)
On top of this, I have a series of functions that operate recursively to simplify expressions. They're not purely functional, but I'm trying to avoid relying on mutability for operations, instead returning a modified copy of the node I'm working with. Each function looks something like this:
def simplify(node):
for index, child in enumerate(node.children):
if isinstance(child, Operator):
node.children[index] = simplify(node)
else:
# perform some operations to simplify numbers and symbols
pass
return node
The challenge comes in the "step by step" part. I'd like for my "simplification" functions to all be nested generators that "yield" the steps it takes to solve something. So basically, every time each function performs an operation, I'd like to be able to do something like this: yield (deepcopy(node), expression, "Combined like terms.") so that whatever is relying on this library can output something like:
5x + 3*4x + 3
5x + 12x + 3 Simplified product 3*4x into 12x
17x + 3 Combined like terms 5x + 12x = 17x
However, each function only has knowledge about the node it's operating on, but has no idea what the overall expression looks like.
So this is my question: What would be the best way of maintaining the "state" of the entire expression tree so that each "step" has knowledge of the entire expression?
Here are the solutions I've come up with:
Do every operation in place and either use a global variable or an instance variable in a class to store a pointer to the equation. I don't like this because unit testing is tougher, since now I have to set up the class first. You also lose other advantages of a more functional approach.
Pass through the root of the expression to every function. However, this either means I have to repeat every operation to also update the expression or that I have to rely on mutability.
Have the top level function 'reconstruct' the expression tree based on each step I yield. For example, if I yield 5x + 4x = 9x, have the top level function find the (5x + 4x) node and replace it with '9x'. This seems like the best solution, but how best to 'reconstruct' each step?
Two final, related questions: Does any of this make sense? I have a lot of caffeine in my system right now and have no idea if I'm being clear.
Am I worrying too much about mutability? Is this a case of premature optimization?
You might be asking about tree zippers. Check: Functional Pearl: Weaving a Web and see if it applies to what you want. From reading your question, I think you're asking to do recursion on a tree structure, but be able to navigate back to the top as necessary. Zippers act as a "breadcrumb" to let you get back to the ancestors of the tree.
I have an implementation of one in JavaScript.
Are you using Polish notation to construct the tree?
For the step by step simplification you can just use a loop until no modifications (operations) can be made in the tree.

User-defined Vector class for python

I've been looking for a way to deal with vectors in python and havent found a solution here or in the documentation that completely fits me.
This is what I've come up with so far for a vector class:
class vec(tuple):
def __add__(self, y):
if len(self)!=len(y):
raise TypeError
else:
ret=[]
for i,entry in enumerate(self):
ret.append(entry+y[i])
return vec(ret)
def __mul__(self, y):
t=y.__class__
if t == int or t==float:
#scalar multiplication
ret=[]
for entry in self:
ret.append(y*entry)
return vec(ret)
elif t== list or t==tuple or t==vec:
# dot product
if len(y)!=len(self):
print 'vecs dimensions dont fit'
raise TypeError
else:
ret=0
for i,entry in enumerate(self):
ret+=entry*y[i]
return ret
Theres a little bit more, left out to keep things short.
So far everythings working fine but I have lots of tiny specific questions (and will probably post more as they come up):
Are there base classes for the numeric and sequence-types and how can I address them?
How can I make all of this more Python-y? I want to learn how to write good Python code, so if you find something that's inefficient or just ugly, please tell me.
What about precision? As python seems to cast from integers to floats only if necessary, input and output are usually of the same type. So there might be problems with very large or small numbers, but I don't really need those currently. Should I generally worry about precision or does python do that for me? Would it be better to convert to the largest possible type automatically? Which one is that? What happens beyond that?
I want to use n-dimensional vectors in a project involving lots of vectorial equations and functions and I'd like to be able to use the usual notation that's used in math textbooks. As you can see this inherits from tuple (for easy construction, immutability and indexing) and most built in functions are overwritten in order to use the (+,-,*,..)- operators. They only work if the left operand is a vec (can I change that?). Multiplication includes dot- and scalar product, pow is also used for the cross-product if both vecs are 3D.
Test Script:
def testVec():
rnd=random.Random()
for i in range(0,10000):
a=utils.vec((rnd.random(),rnd.random(),rnd.random()))
### functions to test
a*(a*a)
###
def testNumpy():
rnd=random.Random()
for i in range(0,10000):
a=np.array((rnd.random(),rnd.random(),rnd.random()))
###
a.dot(a)*a
###
cProfile.run('testNumpy()')
-> 50009 function calls in 0.135 seconds
cProfile.run('testVec()')
-> 100009 function calls in 0.064 seconds

Recipe for anonymous functions in python?

I'm looking for the best recipie to allow inline definition of functions, or multi-line lambda, in python.
For example, I'd like to do the following:
def callfunc(func):
func("Hello")
>>> callfunc(define('x', '''
... print x, "World!"
... '''))
Hello World!
I've found an example for the define function in this answer:
def define(arglist, body):
g = {}
exec("def anonfunc({0}):\n{1}".format(
arglist,
"\n".join(" {0}".format(line) for line in body.splitlines())), g)
return g["anonfunc"]
This is one possible solution, but it is not ideal. Desireable features would be:
be smarter about indentation,
hide the innards better (e.g. don't have anonfunc in the function's scope)
provide access to variables in the surrounding scope / captures
better error handling
and some things I haven't thought of. I had a really nice implementation once that did most of the above, but I lost in unfortunately. I'm wondering if someone else has made something similar.
Disclaimer:
I'm well aware this is controversial among Python users, and regarded as a hack or unpythonic. I'm also aware of the discussions regaring multi-line-lambdas on the python-dev mailing list, and that a similar feature was omitted on purpose. However, from the same discussions I've learned that there is also interest in such a function by many others.
I'm not asking whether this is a good idea or not, but instead: Given that one has decided to implement this, (either out of fun and curiosity, madness, genuinely thinking this is a nice idea, or being held at gunpoint) how to make anonymous define work as close as possible to def using python's (2.7 or 3.x) current facilities?
Examples:
A bit more as to why, this can be really handy for callbacks in GUIs:
# gtk example:
self.ntimes = 0
button.connect('clicked', define('*a', '''
self.ntimes += 1
label.set_text("Button has been clicked %d times" % self.ntimes)
''')
The benefit over defining a function with def is that your code is in a more logical order. This is simplified code taken from a Twisted application:
# twisted example:
def sayHello(self):
d = self.callRemote(HelloCommand)
def handle_response(response):
# do something, this happens after (x)!
pass
d.addCallback(handle_response) # (x)
Note how it seems out of order. I usually break stuff like this up, to keep the code order == execution order:
def sayHello_d(self):
d = self.callRemote(HelloCommand)
d.addCallback(self._sayHello_2)
return d
def _sayHello_2(self, response):
# handle response
pass
This is better wrt. ordering but more verbose. Now, with the anonymous functions trick:
d = self.callRemote(HelloCommand)
d.addCallback(define('response', '''
print "callback"
print "got response from", response["name"]
'''))
If you come from a javascript or ruby background, python's abilities to deal with anonymous functions may indeed seem limited, but this is for a reason. Python designers decided that clarity of code is more important than conciseness. If you don't like that, you probably don't like python at all. There's nothing wrong about that, there are many other choices - why not to try a language that tastes better to you?
Putting chunks of code into strings and interpreting them on the fly is definitely a wrong way to "extend" a language, just because none of tools you're working with - from syntax highlighters to the python interpreter itself - would be able to deal with "stringified" code in a sensible way.
To answer the question as asked: what you're doing there is essentially an attempt to construct some better-than-python programming language and compile it to python on the fly. The idea is not new in the world of scripting languages and can be productive or not (CoffeeScript is an example of a successful implementation), but your very approach is wrong. format() not the tool you're looking for when working with code. If you're writing a compiler, do it properly: use a parser (e.g. pyparsing) to read your code in an AST, walk through the AST to generate python code (or even bytecode), catch syntax errors as you go and take measures to provide better runtime feedback (e.g. error context, line numbers etc). Finally, make sure your compiler works across different python versions and implementations.
Or just use ruby.

Categories