Parsing an equation with sub-formulas in python - python

I'm trying to develop an equation parser using a compiler approach in Python. The main issue that I encounter is that it is more likely that I don't have all the variables and need therefore to look for sub-formulas. Let's show an example that is worth a thousand words ;)
I have four variables whom I know the values: vx, vy, vz and c:
list_know_var = ['vx', 'vy', 'vz', 'c']
and I want to compute the Mach number (M) defined as
equation = 'M = V / c'
I already know the c variable but I don't know V. However, I know that the velocity V that can be computed using the vx, vy and vz and this is stored in a dictionary with other formulas (here only one sub formula is shown)
know_equations = {'V': '(vx ** 2 + vy ** 2 + vz ** 2) ** 0.5'}
Therefore, what I need is to parse the equation and check if I have all the variables. If not, I shall look into the know_equations dictionary to see if an equation is defined for it and this recursively until my equation is well defined.
For now on, I have been able using the answer given here to parse my equation and know if I know all the variables. The issue is that I did not find a way to replace the unknown variable (here V) by its expression in know_equation:
parsed_equation = compiler.parse(equation)
list_var = re.findall("Name\(\'(\w*)\'\)", str(parsed_equation.getChildNodes()[0]))
list_unknow_var = list(set(list_var) - set(list_know_var))
for var in list_unknow_var:
if var in know_equations:
# replace var in equation by its expression given in know_equations
# and repeate until all the variables are known or raise Error
pass
Thank you in advance for your help, much appreciate!
Adrien

So i'm spitballing a bit, but here goes.
The compiler.parse function returns an instance of compiler.ast.Module which contains an abstract syntax tree. You can traverse this instance using the getChildNodes method. By recursively examining the left and right attributes of the nodes as you traverse the tree you can isolate compiler.ast.Name instances and swap them out for your substitution expressions.
So a worked example might be:
import compiler
def recursive_parse(node,substitutions):
# look for left hand side of equation and test
# if it is a variable name
if hasattr(node.left,"name"):
if node.left.name in substitutions.keys():
node.left = substitutions[node.left.name]
else:
# if not, go deeper
recursive_parse(node.left,substitutions)
# look for right hand side of equation and test
# if it is a variable name
if hasattr(node.right,"name"):
if node.right.name in substitutions.keys():
node.right = substitutions[node.right.name]
else:
# if not, go deeper
recursive_parse(node.right,substitutions)
def main(input):
substitutions = {
"r":"sqrt(x**2+y**2)"
}
# each of the substitutions needs to be compiled/parsed
for key,value in substitutions.items():
# this is a quick ugly way of getting the data of interest
# really this should be done in a programatically cleaner manner
substitutions[key] = compiler.parse(substitutions[key]).getChildNodes()[0].getChildNodes()[0].getChildNodes()[0]
# compile the input expression.
expression = compiler.parse(input)
print "Input: ",expression
# traverse the selected input, here we only pass the branch of interest.
# again, as with above, this done quick and dirty.
recursive_parse(expression.getChildNodes()[0].getChildNodes()[0].getChildNodes()[1],substitutions)
print "Substituted: ",expression
if __name__ == "__main__":
input = "t = r*p"
main(input)
I have admittedly only tested this on a handful of use cases, but I think the basis is there for a generic implementation that can handle a wide variety of inputs.
Running this, I get the output:
Input: Module(None, Stmt([Assign([AssName('t', 'OP_ASSIGN')], Mul((Name('r'), Name('p'))))]))
Substituted: Module(None, Stmt([Assign([AssName('t', 'OP_ASSIGN')], Mul((CallFunc(Name('sqrt'), [Add((Power((Name('x'), Const(2))), Power((Name('y'), Const(2)))))], None, None), Name('p'))))]))
EDIT:
So the compiler module is depreciated in Python 3.0, so a better (and cleaner) solution would be to use the ast module:
import ast
from math import sqrt
# same a previous recursion function but with looking for 'id' not 'name' attribute
def recursive_parse(node,substitutions):
if hasattr(node.left,"id"):
if node.left.id in substitutions.keys():
node.left = substitutions[node.left.id]
else:
recursive_parse(node.left,substitutions)
if hasattr(node.right,"id"):
if node.right.id in substitutions.keys():
node.right = substitutions[node.right.id]
else:
recursive_parse(node.right,substitutions)
def main(input):
substitutions = {
"r":"sqrt(x**2+y**2)"
}
for key,value in substitutions.items():
substitutions[key] = ast.parse(substitutions[key], mode='eval').body
# As this is an assignment operation, mode must be set to exec
module = ast.parse(input, mode='exec')
print "Input: ",ast.dump(module)
recursive_parse(module.body[0].value,substitutions)
print "Substituted: ",ast.dump(module)
# give some values for the equation
x = 3
y = 2
p = 1
code = compile(module,filename='<string>',mode='exec')
exec(code)
print input
print "t =",t
if __name__ == "__main__":
input = "t = r*p"
main(input)
This will compile the expression and execute it in the local space. The output should be:
Input: Module(body=[Assign(targets=[Name(id='t', ctx=Store())], value=BinOp(left=Name(id='r', ctx=Load()), op=Mult(), right=Name(id='p', ctx=Load())))])
Substituted: Module(body=[Assign(targets=[Name(id='t', ctx=Store())], value=BinOp(left=Call(func=Name(id='sqrt', ctx=Load()), args=[BinOp(left=BinOp(left=Name(id='x', ctx=Load()), op=Pow(), right=Num(n=2)), op=Add(), right=BinOp(left=Name(id='y', ctx=Load()), op=Pow(), right=Num(n=2)))], keywords=[], starargs=None, kwargs=None), op=Mult(), right=Name(id='p', ctx=Load())))])
t = r*p
t = 3.60555127546

Related

How to solve a Linear System of Equations in Python When the Coefficients are Unknown (but still real numbers)

Im not a programer so go easy on me please ! I have a system of 4 linear equations and 4 unknowns, which I think I could use python to solve relatively easily. However my equations not of the form " 5x+2y+z-w=0 " instead I have algebraic constants c_i which I dont know the explicit numerical value of, for example " c_1 x + c_2 y + c_3 z+ c_4w=c_5 " would be one my four equations. So does a solver exist which gives answers for x,y,z,w in terms of the c_i ?
Numpy has a function for this exact problem: numpy.linalg.solve
To construct the matrix we first need to digest the string turning it into an array of coefficients and solutions.
Finding Numbers
First we need to write a function that takes a string like "c_1 3" and returns the number 3.0. Depending on the format you want in your input string you can either iterate over all chars in this array and stop when you find a non-digit character, or you can simply split on the space and parse the second string. Here are both solutions:
def find_number(sub_expr):
"""
Finds the number from the format
number*string or numberstring.
Example:
3x -> 3
4*x -> 4
"""
num_str = str()
for char in sub_expr:
if char.isdigit():
num_str += char
else:
break
return float(num_str)
or the simpler solution
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split()[1])
Note: See edits
Get matrices
Now we can use that to split each expression into two parts: The solution and the equation by the "=". The equation is then split into sub_expressions by the "+" This way we would end turn the string "3x+4y = 3" into
sub_expressions = ["3x", "4y"]
solution_string = "3"
Each sub expression then needs to be fed into our find_numbers function. The End result can be appended to the coefficient and solution matrices:
def get_matrices(expressions):
"""
Returns coefficient_matrix and solutions from array of string-expressions.
"""
coefficient_matrix = list()
solutions = list()
last_len = -1
for expression in expressions:
# Note: In this solution all coefficients must be explicitely noted and must always be in the same order.
# Could be solved with dicts but is probably overengineered.
if not "=" in expression:
print(f"Invalid expression {expression}. Missing \"=\"")
return False
try:
c_string, s_string = expression.split("=")
c_strings = c_string.split("+")
solutions.append(float(s_string))
current_len = len(c_strings)
if last_len != -1 and current_len != last_len:
print(f"The expression {expression} has a mismatching number of coefficients")
return False
last_len = current_len
coefficients = list()
for c_string in c_strings:
coefficients.append(find_number(c_string))
coefficient_matrix.append(coefficients)
except Exception as e:
print(f"An unexpected Runtime Error occured at {coefficient}")
print(e)
exit()
return coefficient_matrix, solutions
Now let's write a simple main function to test this code:
# This is not the code you want to copy-paste
# Look further down.
from sys import argv as args
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
for row in matrix:
print(row)
print("")
print(solutions)
if __name__ == "__main__":
main()
Let's run the program in the console!
user:$ python3 solve.py 2x+3y=4 3x+3y=2
[2.0, 3.0]
[3.0, 3.0]
[4.0, 2.0]
You can see that the program identified all our numbers correctly
AGAIN: use the find_number function appropriate for your format
Put The Pieces Together
These Matrices now just need to be pumped directly into the numpy function:
# This is the main you want
from sys import argv as args
from numpy.linalg import solve as solve_linalg
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
coefficients = solve_linalg(matrix, solutions)
print(coefficients)
# This bit needs to be at the very bottom of your code to load all functions first.
# You could just paste the main-code here, but this is considered best-practice
if __name__ == '__main__':
main()
Now let's test that:
$ python3 solve.py x*2+y*4+z*0=20 x*1+y*1+z*-1=3 x*2+y*2+z*-3=3
[2. 4. 3.]
As you can see the program now solves the functions for us.
Out of curiosity: Math homework? This feels like math homework.
Edit: Had a typo "c_string" instead of "c_strings" worked out in all tests out of pure and utter luck.
Edit 2: Upon further inspection I would reccomend to split the sub-expressions by a "*":
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split("*")[1])
This results in fairly readable input strings

How do I print out the constant names when printing all the constants in a C file using pycparser?

I'm working on automating a tool that prints out all constants in a C file. So far, I have managed to print out all the constants in a C file but I can't figure out of a way to show the variable names they are associated with without printing out the whole abstract syntax tree, which has a lot of unnecessary information for me. Does anyone have any ideas? Right now, it will print out the constants, and their type. Here is my code:
from pycparser import c_parser, c_ast, parse_file
class ConstantVisitor(c_ast.NodeVisitor):
def __init__(self):
self.values = []
def visit_Constant(self, node):
self.values.append(node.value)
node.show(showcoord=True,nodenames=True,attrnames=True)
def show_tree(filename):
# Note that cpp is used. Provide a path to your own cpp or
# make sure one exists in PATH.
ast = parse_file(filename, use_cpp=True,cpp_args=['-E', r'-Iutils/fake_libc_include'])
cv = ConstantVisitor()
cv.visit(ast)
if __name__ == "__main__":
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = 'xmrig-master/src/crypto/c_blake256.c'
show_tree(filename)
edit:
current output: constant: type=int, value=0x243456BE
desired output: constant: type=int, name=variable name constant belongs to(usually an array name), value=0x243456BE
You may need to create a more sophisticated visitor if you want to retain information about parent nodes of the visited node. See this FAQ answer for an example.
That said, you will also need to define your goal much more clearly. Constant nodes are not always associated with a variable. For example:
return a + 30 + 50;
Has two Constant nodes in it (for 30 and for 50); what variable are they associated with?
Perhaps what you're looking for is variable declarations - Decl nodes with names. Then once you find a Decl node, do another visiting under this node looking for all Constant nodes.

lambdify a sympy expression that contains a Derivative of UndefinedFunction

I have several expressions of an undefined function some of which contain the corresponding (undefined) derivatives of that function. Both the function and its derivatives exist only as numerical data. I want to make functions out of my expressions and then call that function with the corresponding numerical data to numerically compute the expression. Unfortunately I have run into a problem with lambdify.
Consider the following simplified example:
import sympy
import numpy
# define a parameter and an unknown function on said parameter
t = sympy.Symbol('t')
s = sympy.Function('s')(t)
# a "normal" expression
a = t*s**2
print(a)
#OUT: t*s(t)**2
# an expression which contains a derivative
b = a.diff(t)
print(b)
#OUT: 2*t*s(t)*Derivative(s(t), t) + s(t)**2
# generate an arbitrary numerical input
# for demo purposes lets assume that s(t):=sin(t)
t0 = 0
s0 = numpy.sin(t0)
sd0 = numpy.cos(t0)
# labdify a
fa = sympy.lambdify([t, s], a)
va = fa(t0, s0)
print (va)
#OUT: 0
# try to lambdify b
fb = sympy.lambdify([t, s, s.diff(t)], b) # this fails with syntax error
vb = fb(t0, s0, sd0)
print (vb)
Error message:
File "<string>", line 1
lambda _Dummy_142,_Dummy_143,Derivative(s(t), t): (2*_Dummy_142*_Dummy_143*Derivative(_Dummy_143, _Dummy_142) + _Dummy_143**2)
^
SyntaxError: invalid syntax
Apparently the Derivative object is not resolved correctly, how can I work around that?
As an alternative to lambdify I'm also open to using theano or cython based solutions, but I have encountered similar problems with the corresponding printers.
Any help is appreciated.
As far as I can tell, the problem originates from an incorrect/unfortunate dummification process within the lambdify function. I have written my own dummification function that I apply to the parameters as well as the expression before passing them to lambdifying.
def dummify_undefined_functions(expr):
mapping = {}
# replace all Derivative terms
for der in expr.atoms(sympy.Derivative):
f_name = der.expr.func.__name__
var_names = [var.name for var in der.variables]
name = "d%s_d%s" % (f_name, 'd'.join(var_names))
mapping[der] = sympy.Symbol(name)
# replace undefined functions
from sympy.core.function import AppliedUndef
for f in expr.atoms(AppliedUndef):
f_name = f.func.__name__
mapping[f] = sympy.Symbol(f_name)
return expr.subs(mapping)
Use like this:
params = [dummify_undefined_functions(x) for x in [t, s, s.diff(t)]]
expr = dummify_undefined_functions(b)
fb = sympy.lambdify(params, expr)
Obviously this is somewhat brittle:
no guard against name-collisions
perhaps not the best possible name-scheme: df_dxdy for Derivative(f(x,y), x, y)
it is assumed that all derivatives are of the form:
Derivative(s(t), t, ...) with s(t) being an UndefinedFunction and t a Symbol. I have no idea what will happen if any argument to Derivative is a more complex expression. I kind of think/hope that the (automatic) simplification process will reduce any more complex derivative into an expression consisting of 'basic' derivatives. But I certainly do not guard against it.
largely untested (except for my specific use-cases)
Other than that it works quite well.
First off, rather than an UndefinedFunction, you could go ahead and use the implemented_function function to tie your numerical implementation of s(t) to a symbolic function.
Then, if you are constrained to discrete numerical data defining the function whose derivative occurs in the troublesome expression, much of the time, the numerical evaluation of the derivative may come from finite differences. As an alternative, sympy can automatically replace derivative terms with finite differences, and let the resulting expression be converted to a lambda. For example:
import sympy
import numpy
from sympy.utilities.lambdify import lambdify, implemented_function
from sympy import Function
# define a parameter and an unknown function on said parameter
t = sympy.Symbol('t')
s = implemented_function(Function('s'), numpy.cos)
print('A plain ol\' expression')
a = t*s(t)**2
print(a)
print('Derivative of above:')
b = a.diff(t)
print(b)
# try to lambdify b by first replacing with finite differences
dx = 0.1
bapprox = b.replace(lambda arg: arg.is_Derivative,
lambda arg: arg.as_finite_difference(points=dx))
print('Approximation of derivatives:')
print(bapprox)
fb = sympy.lambdify([t], bapprox)
t0 = 0.0
vb = fb(t0)
print(vb)
The similar question was discussed at here
You just need to define your own function and define its derivative as another function:
def f_impl(x):
return x**2
def df_impl(x):
return 2*x
class df(sy.Function):
nargs = 1
is_real = True
_imp_ = staticmethod(df_impl)
class f(sy.Function):
nargs = 1
is_real = True
_imp_ = staticmethod(f_impl)
def fdiff(self, argindex=1):
return df(self.args[0])
t = sy.Symbol('t')
print f(t).diff().subs({t:0.1})
expr = f(t) + f(t).diff()
expr_real = sy.lambdify(t, expr)
print expr_real(0.1)

Fill Z3Py Tree with values

I have this tree datatype, where I want to be able to go up and down. But actually, any structure will work, as long as I can get the parent/left child/right child of a node.
Tree = Datatype('Tree')
Tree.declare('nil')
Tree.declare('node', ('label', IntSort()), ('left', Tree), ('up', Tree), ('right', Tree))
Tree = Tree.create()
But I don't know how to fill the datastructure... I thought I could create the tree one node at a time like the following, but that seems to be the wrong way.
trees = [Const('t' + str(i), Tree) for i in range(3)]
Tree.up(trees[0]) = Tree.nil
Tree.left(trees[0]) = trees[1]
Tree.right(trees[0]) = trees[2]
Tree.up(trees[1]) = trees[0]
Tree.up(trees[2]) = trees[0]
Thanks for your help.
I have the feeling that you are slightly mistaken about how such data structures work. Z3's language, and thus also Z3py's, is the language of first-order logic (modulo theories), in which only assumptions/facts exist. One of the major differences with respect to imperative programming languages is, that you don't have destructive updates*, i.e., you cannot update the value of an entity.
In your code you try to build the data structures in an imperative way by essentially manipulating pointers, e. g., by the (attempted) assignment
Tree.up(trees[0]) = Tree.nil
What you probably had in mind was "making trees[0].up point to nil". However, since destructive updates are not part of a first-order language, you instead have to "think functional", i.e., "assume that up(trees[0]) == nil". Adding such assumptions can be done as follows (online on rise4fun):
s = Solver()
s.add(
Tree.up(trees[0]) == Tree.nil,
Tree.left(trees[0]) == trees[1],
Tree.right(trees[0]) == trees[2],
Tree.up(trees[1]) == trees[0],
Tree.up(trees[2]) == trees[0]
)
print s.check() # sat -> your constraints are satisfiable
A consequence of not having destructive updates is that you cannot modify your data structure in a way typical for imperative programming languages. With such updates it would be possible to modify the tree such that trees[1].up.left != trees[1], for example, by the assignment trees[1].up.left = trees[2] (for trees[1] != trees[2]). However, If you add corresponding assumptions
s.add(
trees[1] != trees[2],
Tree.left(Tree.up(trees[1])) == trees[2]
)
print s.check() # unsat
you'll see that your constraints are no longer satisfiable, because the old and new assumptions contradict.
By the way, your definition of the Tree datatype makes it possible to assume that Tree.up(nil) == someTree. Assuming that t1 != t2 are leaves of a tree and that left(t1) == right(t1) == nil and analogous for t2, then you'll have a contradiction if up(nil) == t1 and up(nil) == t2. I don't know how to prevent this on the level of Z3, though.
* Destructive updates could be added as syntactic sugar because they can be turned into assumptions by a so-called passification transformation. This, for example, is done in the intermediate verification language Boogie.

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result?
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!
You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).
Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)
I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Categories