Related
Hope to find some help here with a new excercise.
I luckily understand how to use recursive functions, but this one is killing me, im probably just thinking too much outside of the box.
We're given a string:
c = "3+4*5+6+1*3"
And now we have to code a function, which recursivly gives us the result of that calculation.
Now i know the recursive end should be the length of the string, which should be 1.
Our professor did give us another example which we should use for this function.
int(number)
string.split(symbol, 1)
we have following code given:
c = "3+4*5+6+1*3"
print(c)
print()
sub1, sub2 = c.split("+", 1)
print("Result with '+':")
print("sub1=" + sub1)
print("sub2=" + sub2)
print()
sub1, sub2 = c.split("*", 1)
print("Result with '*':")
print("sub1=" + sub1)
print("sub2=" + sub2)
print()
My thoughts were to split the strings to a minimum, so i can turn them into integers and than sum them together. But im absolutly lost there how the code should look like, im a real beginner so im really sorry. I dont even know it the beginning i was thinking of is right. Still hoping, someone can help!
what i had:
def calc(string):
if len(string) == 1:
return string
Thanks for all of you!
Greets
Chrissi
I created a function that solves equations like these. However, the only characters possible are numbers, and operators add (+) and mult (*). If you try to use any other characters such as spaces, there's going to be errors.
# Solves a mathematical equation containing digits [0-9], and operators
# such as add + or multiply *
def solve(equation, operators, oindex=0):
# If an operator is available, use the operator
if (oindex < len(operators)):
# Get the current operator and pair: a op b
op = operators[oindex]
pair = equation.split(op, 1)
# If the pair is a pair (has 2 elements)
if (len(pair) == 2):
# Solve left side
a = solve(pair[0], operators)
# Solve right side
b = solve(pair[1], operators)
# If current operator is multiply: multiply a and b
if (op == '*'):
print(a, '*', b, '=', a*b)
return a * b
# If current operator is add: add a and b
elif (op == '+'):
print(a, '+', b, '=', a+b)
return a + b
# If it's not a pair, try using another operator
else:
return solve(equation, operators, oindex+1)
else:
# If no operators are available, then equation is
# just a number.
return int(equation)
if __name__ == '__main__':
equation = "3+4*5+6+1*3"
# If mult (*) takes precedence, operator order is "+*"
# > 3+4*5+6+1*3 = ((3)+((4*5)+((6)+(1*3))))
# = ((3)+((20)+((6)+(3))))
# = ((3)+(20+(9)))
# = ((3)+(29))
# = (32)
print("MULT, then ADD -> ",equation + " = ", solve(equation, "+*"))
# If add (+) takes precedence, operator order is "*+"
# > 3+4*5+6+1*3 = ((3+4)*(((5)+(6+1))*(3)))
# = ((7)*((5+7)*(3)))
# = ((7)*(12*3))
# = (7*36)
# = (252)
print("ADD, then MULT -> ",equation + " = ", solve(equation, "*+"))
To set the operators, you can use the paramter operators, which is a string that takes all supported operators. In this case: +*.
The order of these characters matter, changing the precedence of each operator inside the equation.
Here's the output:
4 * 5 = 20
1 * 3 = 3
6 + 3 = 9
20 + 9 = 29
3 + 29 = 32
MULT, then ADD -> 3+4*5+6+1*3 = 32
3 + 4 = 7
6 + 1 = 7
5 + 7 = 12
12 * 3 = 36
7 * 36 = 252
ADD, then MULT -> 3+4*5+6+1*3 = 252
Context
Firstly, thanks for hypothesis. It's both extremely powerful and extremely useful!
I've written a hypothesis strategy to produce monotonic (ANDS and ORs) policy expressions of the form:
(A and (B or C))
This can be thought of as a tree structure, where A, B and C are attributes at the leaf nodes, whereas 'and' and 'or' are non-leaf nodes.
The strategy seems to generate expressions as desired.
>>> find(policy_expressions(), lambda x: len(x.split()) > 3)
'(A or (A or A))'
(Perhaps the statistical diversity of examples could be improved, but that is not the essence of this question).
Inequalities are valid too. For example:
(N or (WlIorO and (nX <= 55516 and e)))
I want to constrain or filter the examples so that I can generate policy expressions with a specified number of leaf nodes (i.e. attributes).
For a performance test, I've tried using data.draw() with filter something like this:
#given(data=data())
def test_keygen_encrypt_proxy_decrypt_decrypt_execution_time(data, n):
"""
:param n: the input size n. Number of attributes or leaf nodes in policy tree.
"""
policy_str = data.draw(strategy=policy_expressions().filter(lambda x: len(extract_attributes(group, x)) == n),
label="policy string")
Where extract_attributes() counts the number of leaf nodes in the expression and n is the desired number of leaves.
The problem with this solution is that when n > 16, hypothesis throws a:
hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of hypothesis test_keygen_encrypt_proxy_decrypt_decrypt_execution_time.
I want to generate valid policy expressions with 100s of leaf nodes.
Another drawback of that approach was that hypothesis reported HealthCheck.filter_too_much and HealthCheck.too_slow and the #settings got ugly.
I would rather have a parameter to say policy_expressions(leaf_nodes=4) to get an example like this:
(N or (WlIorO and (nX <= 55516 and e)))
I avoided that initially, because I'm not able to see how to do it with the recursive strategy code.
Question
Can you suggest a way to refactor this strategy so that it can be parameterized for number of leaf nodes?
Here's the strategy code (its open source in Charm Crypto anyway)
from hypothesis.strategies import text, composite, sampled_from, characters, one_of, integers
def policy_expressions():
return one_of(attributes(), inequalities(), policy_expression())
#composite
def policy_expression(draw):
left = draw(policy_expressions())
right = draw(policy_expressions())
gate = draw(gates())
return u'(' + u' '.join((left, gate, right)) + u')'
def attributes():
return text(min_size=1, alphabet=characters(whitelist_categories='L', max_codepoint=0x7e))
#composite
def inequalities(draw):
attr = draw(attributes())
oper = draw(inequality_operators())
numb = draw(integers(min_value=1))
return u' '.join((attr, oper, str(numb)))
def inequality_operators():
return sampled_from((u'<', u'>', u'<=', u'>='))
def gates():
return sampled_from((u'or', u'and'))
def assert_valid(policy_expression):
assert policy_expression # not empty
assert policy_expression.count(u'(') == policy_expression.count(u')')
https://github.com/JHUISI/charm/blob/dev/charm/toolbox/policy_expression_spec.py
I'd suggest explicitly building in the number of leaves into how the data is constructed, then passing in the number of leaves you want:
from hypothesis.strategies import text, composite, sampled_from, characters, one_of, integers
def policy_expressions_of_size(num_leaves):
if num_leaves == 1:
return attributes()
elif num_leaves == 2:
return one_of(inequalities(), policy_expression(num_leaves))
else:
return policy_expression(num_leaves)
policy_expressions = integers(min_value=1, max_value=500).flatmap(policy_expressions_of_size)
#composite
def policy_expression(draw, num_leaves):
left_leaves = draw(integers(min_value=1, max_value=num_leaves - 1))
right_leaves = num_leaves - left_leaves
left = draw(policy_expressions_of_size(left_leaves))
right = draw(policy_expressions_of_size(right_leaves))
gate = draw(gates())
return u'(' + u' '.join((left, gate, right)) + u')'
def attributes():
return text(min_size=1, alphabet=characters(whitelist_categories='L', max_codepoint=0x7e))
#composite
def inequalities(draw):
attr = draw(attributes())
oper = draw(inequality_operators())
numb = draw(integers(min_value=1))
return u' '.join((attr, oper, str(numb)))
def inequality_operators():
return sampled_from((u'<', u'>', u'<=', u'>='))
def gates():
return sampled_from((u'or', u'and'))
Then you can pick exactly how large you want the policy expression to be:
>>> policy_expressions.example()
'((((((oOjFo or (((cH and (Q or (uO > 18 and byy))) and kS) or pqKUUZ > 74)) and (gi or mwsrU <= 4115)) and qLkVSTqXZxgScTj) and (vNJ > 969 and (Drwvh or (((xhmsWhHpc or hQSMnfgyiYnblLFJ) or sesfHbQ) and jt)))) or xS) and ((V and (mArqYR or qY)) or (((uVf and bbtKUCnecMKjRJD > 18944) and nerVkPSs < 29292) and (UlOJebfbgcJz or (bxfVfjgmfulSB > 71 or (jqGLlr or (zQqj and zqUGwc < 24845)))))))'
>>>
>>> policy_expressions_of_size(1).example()
'Eo'
>>>
>>> policy_expressions_of_size(2).example()
'KJAitOKC > 18179'
>>> policy_expressions_of_size(10).example()
'(((htjdVy or (((XTfZil or (rqZw and DEOeER)) and xGVsdeQJLTJxLsC < 388312303) or LxLfUPljUTH)) or (Kb or EoipoYzjncAGKTE)) or bc)'
>>> policy_expressions_of_size(100).example()
'(((((CxySeUrNW or bZG) or (gzSUGgTG and (((V or n) or wqA) or veuTEnjGKwIpkDDDBiQkMwsNbxrBv))) or (((SKgQSXtAg or ChCHcEsVavy) and (((Yxj and xcCX) or QrILGAWxVKXWRb > 98817811688973569232860005374239659122) or JD <= 28510)) and KhrGfZciz > 4057857855522854443)) and (ZMIzFELKAKDMrH and (((MOmAZ and J <= 22052) or (Scy >= 17563 and (VCS and ((FFLa and EtZvqwNymnZNnjlREM) or pU)))) or A))) and ((((kaYzzIXIu and (lwos and (vp and GqG))) and ((Nh and lb) or ((TbNZWYOpYmj and (AQs or w)) or NjFYLBr > 228431293))) or ((((FTSXkXGZyKXD or zXeVEqNgkyXI) or mNGI) or ((cGOGK or gjcI) and DQzYonXszfSrZMB)) and JI > 3802)) or (((jIREd and IVzFB >= 28149) and (UdCBg < 20 or (VSGxr or XBuiS <= 1615))) and (rE > 10511139808015932 and ((((((((W and u) or yslVZ) or (eVGlz < 7033 or UiE)) and ((trOmArBc and Zx) or mPKva)) or ((qqDmKUpAnW or yvSkhTgqXQaLnxL) or Z)) or snXcMDhhf) and ((Wu or XSjbKdsZqEiXXvOb) and (DNZg and qv >= 7503))) and ((rnffxTLThwvw >= 24460 and ((oO or y <= 24926) and (NjM and vEHukii))) or ((((BTdpW and rP) or (rjUylCZwJzGobXZR or MNoBdEEIuLbTRvZHMb < 7958346708112664935)) and ((YU or gY >= 15498) and (s and GnOydthO > 103))) or ((caumKPjp < 27 and OQoFXscbD) or ((qaxYwfnelmetYqHKnatQ or P) and (ixzsvX and mYROpqoHAqeEy))))))))))'
I want to write a Python code that will evaluate an expression using stack. I have the following code, where numStk is a stack that holds number and optStk that holds operators. In the expression 2+3*4-6, at the end of for loop, numStack contains 2, 12, and 6; and optStk contains - and +. Now how can I make my setOps() function to pop elements from the two stacks to do the evaluate the expression?
def main():
raw_expression = input("Enter an expression: ")
expression = raw_expression.replace(" ", "")
for i in expression:
if (i in numbers):
numStk.push(i)
else:
setOps(i)
optStk.push(i)
## code needed to evaluate the rest of the elements in stackstack
return valStk.top()
My setOps(i) function is as follow:
def repeatOps(refOp):
while (len(valStk) > 1 and operators.index(refOp) <= operators.index(optStk.top())):
x = numStk.pop()
y = numStk.pop()
op = optStk.pop()
numStk.push(str(eval(x+op+y)))
Even if I fill in all the stuff you left out, there are issues with your code: setOps() appears to be called repeatOps(); numStk is sometimes called valStk; you evaluate in the wrong order, e.g. "6-5" is evaluated "5-6"; you're calling eval()!
Below's my filling out and reworking of your code to address the above issues:
from collections import OrderedDict
DIGITS = "0123456789"
# position implies (PEMDAS) priority, low to high
OPERATORS = OrderedDict([ \
['+', lambda a, b: a + b], \
['-', lambda a, b: a - b], \
['*', lambda a, b: a * b], \
['/', lambda a, b: a / b], \
])
def operator_priority(character):
return list(OPERATORS.keys()).index(character)
class Stack(list):
""" minimalist stack implementation """
def push(self, thing):
self.append(thing)
def top(self):
return self[-1]
def evaluate(expression):
numStk = Stack()
optStk = Stack()
def setOps(refOp):
while numStk and optStk and operator_priority(refOp) <= operator_priority(optStk.top()):
y = numStk.pop()
x = numStk.pop()
op = optStk.pop()
print(x, op, y) # debugging
numStk.push(OPERATORS[op](x, y))
for i in expression:
if i in DIGITS:
numStk.push(int(i))
else:
setOps(i)
optStk.push(i)
if optStk:
# evaluate the rest of the elements in stacks
setOps(list(OPERATORS.keys())[0]) # trigger using lowest priority operator
return numStk.top()
if __name__ == "__main__":
raw_expression = input("Enter an expression: ")
expression = raw_expression.replace(" ", "")
print(evaluate(expression))
Far from perfect but something to get you going:
EXAMPLE
> python3 test.py
Enter an expression: 2+3*4-6
3 * 4
12 - 6
2 + 6
8
>
To address your original question, the key to finishing the evaluation seems to be running setOps() with a fictitious, low priority operator if there's anything left in the optStk.
I have a string that is a mathematical equation, but with some custom functions. I need to find all such functions and replace them with some code.
For example, I have a string:
a+b+f1(f2(x,y),x)
I want code that will replace (say) f2(x,y) with x+y^2 and f1(x,y) with sin(x+y).
It would be ideal if nested functions were supported, like in the example. However, it would still be useful if nesting was not supported.
As I understand from similar topics this can be done using a compiler module like compiler.parse(eq). How I can work with AST object created by compiler.parse(eq) to reconstruct my string back, replacing all found functions?
I need only to perform substitution and then string will be used in other program. Evaluation is not needed.
Here is a minimal working example (+, - , *, /, ** binary and unary operations and function call implemented). The priority of operations are set with parenthesis.
A little bit more than the functionality for the example given is done:
from __future__ import print_function
import ast
def transform(eq,functions):
class EqVisitor(ast.NodeVisitor):
def visit_BinOp(self,node):
#generate("=>BinOp")
generate("(")
self.visit(node.left)
self.visit(node.op)
#generate("ici",str(node.op),node._fields,node._attributes)
#generate(dir(node.op))
self.visit(node.right)
generate(")")
#ast.NodeVisitor.generic_visit(self,node)
def visit_USub(self,node):
generate("-")
def visit_UAdd(self,node):
generate("+")
def visit_Sub(self,node):
generate("-")
def visit_Add(self,node):
generate("+")
def visit_Pow(self,node):
generate("**")
def visit_Mult(self,node):
generate("*")
def visit_Div(self,node):
generate("/")
def visit_Name(self,node):
generate(node.id)
def visit_Call(self,node):
debug("function",node.func.id)
if node.func.id in functions:
debug("defined function")
func_visit(functions[node.func.id],node.args)
return
debug("not defined function",node.func.id)
#generate(node._fields)
#generate("args")
generate(node.func.id)
generate("(")
sep = ""
for arg in node.args:
generate (sep)
self.visit(arg)
sep=","
generate(")")
def visit_Num(self,node):
generate(node.n)
def generic_visit(self, node):
debug ("\n",type(node).__name__)
debug (node._fields)
ast.NodeVisitor.generic_visit(self, node)
def func_visit(definition,concrete_args):
class FuncVisitor(EqVisitor):
def visit_arguments(self,node):
#generate("visit arguments")
#generate(node._fields)
self.arguments={}
for concrete_arg,formal_arg in zip(concrete_args,node.args):
#generate(formal_arg._fields)
self.arguments[formal_arg.id]=concrete_arg
debug(self.arguments)
def visit_Name(self,node):
debug("visit Name",node.id)
if node.id in self.arguments:
eqV.visit(self.arguments[node.id])
else:
generate(node.id)
funcV=FuncVisitor()
funcV.visit(ast.parse(definition))
eqV=EqVisitor()
result = []
def generate(s):
#following line maybe usefull for debug
debug(str(s))
result.append(str(s))
eqV.visit(ast.parse(eq,mode="eval"))
return "".join(result)
def debug(*args,**kwargs):
#print(*args,**kwargs)
pass
Usage:
functions= {
"f1":"def f1(x,y):return x+y**2",
"f2":"def f2(x,y):return sin(x+y)",
}
eq="-(a+b)+f1(f2(+x,y),z)*4/365.12-h"
print(transform(eq,functions))
Result
((-(a+b)+(((sin((+x+y))+(z**2))*4)/365.12))-h)
WARNING
The code works with Python 2.7 and as it is AST dependent is not guaranteed to work with another version of Python. The Python 3 version doesn't work.
The full substitution is quite tricky. Here is my attempt to do it. Here we can successfully inline expressions,
but not in all scenarios. This code works on AST only, made by ast module. And uses codegen to stringify it back to code. The stringifying of ast and modifying ast in general is covered in other SO Q/A: "Parse a .py file, read the AST, modify it, then write back the modified source code".
First we define few helpers:
import ast
import codegen
import copy
def parseExpr(expr):
# Strip:
# Module(body=[Expr(value=
return ast.parse(expr).body[0].value
def toSource(expr):
return codegen.to_source(expr)
After that we define a substitution function using NodeTransformer.
For example:
substitute(parseExpr("a + b"), { "a": parseExpr("1") }) # 1 + b
The simulatenous substitution of multiple variables is needed to properly avoid nasty situations.
For example substituting both a and b for a + b in a + b.
The result should be (a + b) + (a + b), but if we substitute first a for a + b, we'll get (a + b) + b, and then substitute b, we'll get (a + (a + b)) + b which is the wrong result! So simultaneous is important:
class NameTransformer(ast.NodeTransformer):
def __init__(self, names):
self.names = names
def visit_Name(self, node):
if node.id in self.names:
return self.names[node.id]
else:
return node
def substitute(expr, names):
print "substitute"
for varName, varValue in names.iteritems():
print " name " + varName + " for " + toSource(varValue)
print " in " + toSource(expr)
return NameTransformer(names).visit(expr)
Then we write similar NodeTransformer to find calls, where we can inline function definitions:
class CallTransformer(ast.NodeTransformer):
def __init__(self, fnName, varNames, fnExpr):
self.fnName = fnName
self.varNames = varNames
# substitute in new fn expr for each CallTransformer
self.fnExpr = copy.deepcopy(fnExpr)
self.modified = False
def visit_Call(self, node):
if (node.func.id == self.fnName):
if len(node.args) == len(self.varNames):
print "expand call to " + self.fnName + "(" + (", ".join(self.varNames)) + ")" + " with arguments "+ ", ".join(map(toSource, node.args))
# We substitute in args too!
old_node = node
args = map(self.visit, node.args)
names = dict(zip(self.varNames, args))
node = substitute(self.fnExpr, names)
self.modified = True
return node
else:
raise Exception("invalid arity " + toSource(node))
else:
return self.generic_visit(node)
def substituteCalls(expr, definitions, n = 3):
while True:
if (n <= 0):
break
n -= 1
modified = False
for fnName, varNames, fnExpr in definitions:
transformer = CallTransformer(fnName, varNames, fnExpr)
expr = transformer.visit(expr)
modified = modified or transformer.modified
if not modified:
break
return expr
The substituteCalls is recursive so we can inline recursive functions too. Also there is an explicit limit, because some definitions might be infinitely recursive (as fact below). There is a bit of ugly looking copying, but it is required to separate different subtrees.
And the example code:
if True:
print "f1 first, unique variable names"
ex = parseExpr("a+b+f1(f2(x, y), x)")
ex = substituteCalls(ex, [
("f1", ["u", "v"], parseExpr("sin(u + v)")),
("f2", ["i", "j"], parseExpr("i + j ^ 2"))])
print toSource(ex)
print "---"
if True:
print "f1 first"
ex = parseExpr("a+b+f1(f2(x, y), x)")
ex = substituteCalls(ex, [
("f1", ["x", "y"], parseExpr("sin(x + y)")),
("f2", ["x", "y"], parseExpr("x + y ^ 2"))])
print toSource(ex)
print "---"
if True:
print "f2 first"
ex = parseExpr("f1(f1(x, x), y)")
ex = substituteCalls(ex, [
("f1", ["x", "y"], parseExpr("x + y"))])
print toSource(ex)
print "---"
if True:
print "fact"
ex = parseExpr("fact(n)")
ex = substituteCalls(ex, [
("fact", ["n"], parseExpr("n if n == 0 else n * fact(n-1)"))])
print toSource(ex)
print "---"
Which prints out:
f1 first, unique variable names
expand call to f1(u, v) with arguments f2(x, y), x
substitute
name u for f2(x, y)
name v for x
in sin((u + v))
expand call to f2(i, j) with arguments x, y
substitute
name i for x
name j for y
in ((i + j) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f1 first
expand call to f1(x, y) with arguments f2(x, y), x
substitute
name y for x
name x for f2(x, y)
in sin((x + y))
expand call to f2(x, y) with arguments x, y
substitute
name y for y
name x for x
in ((x + y) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f2 first
expand call to f1(x, y) with arguments f1(x, x), y
expand call to f1(x, y) with arguments x, x
substitute
name y for x
name x for x
in (x + y)
substitute
name y for y
name x for (x + x)
in (x + x)
((x + x) + ((x + x) + x))
---
fact
expand call to fact(n) with arguments n
substitute
name n for n
in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments (n - 1)
substitute
name n for (n - 1)
in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments ((n - 1) - 1)
substitute
name n for ((n - 1) - 1)
in n if (n == 0) else (n * fact((n - 1)))
n if (n == 0) else (n * (n - 1) if ((n - 1) == 0) else ((n - 1) * ((n - 1) - 1) if (((n - 1) - 1) == 0) else (((n - 1) - 1) * fact((((n - 1) - 1) - 1)))))
Unfortunately codegen version in pypi is buggy. It doesn't parenthesise expressions properly, even AST says they should. I used jbremer/codegen (pip install git+git://github.com/jbremer/codegen). It adds unnecessary parenthesis too, but it's better than no at all. Thanks to #XavierCombelle for the tip.
The substitution gets trickier if you have anonymous functions, i.e lambda. Then you need to rename variables. You could try to search for lambda calculus with substitution or implementation. Yet I had bad luck to find any articles which use Python for the task.
Do you know the variables beforehand?
I recommend using SymPy!
Take for example the following:
import sympy
a,b,x,y = sympy.symbols('a b x y')
f1 = sympy.Function('f1')
f2 = sympy.Function('f2')
readString = "a+b+f1(f2(x,y),x)"
z = eval(readString)
'z' will now be a symbolic term representing the mathematical formula. You can print it out. You can then use subs to replace symbolic terms or functions. You can either represent sine symbolically again (like f1 and f2) or you can possibly use the sin() in sympy.mpmath.
Depending on your needs, this approach is great because you can eventually compute, evaluate or simplify this expression.
What is your long term goal? Is it to evaluate the function or simply perform substitution? In the former case you can simply try this (note that f1 and f2 could also be dynamically defined):
import math
math.sin
def f2(x, y):
return x + y ** 2
def f1(x, y):
return math.sin(x + y)
a, b = 1, 2
x, y = 3, 4
eval('a + b + f1(f2(x, y), x)')
# 2.991148690709596
If you want to replace the functions and get back the modified version, you will indeed have to resort to some sort of AST parser. Be careful though with the use of eval, as this opens up a security hole for malicious user input code.
(Using sympy as adrianX suggested, with some extra code.)
Code below converts a given string to a new string after combining given functions. It's hasty and poorly documented, but it works.
WARNING!
Contains exec eval, malicious code could probably have an effected, if input is provided by external users.
UPDATE:
Rewrote the whole code. Works in Python 2.7.
Function arguments can be separated by comma or whitespace or both.
All examples in question and comments are working.
import re
import sympy
##################################################
# Input string and functions
initial_str = 'a1+myf1(myf2(a, b),y)'
given_functions = {'myf1(x,y)': 'cross(x,y)', 'myf2(a, b)': 'value(a,b)'}
##################################################
print '\nEXECUTED/EVALUATED STUFF:\n'
processed_str = initial_str
def fixed_power_op(str_to_fix):
return str_to_fix.replace('^', '**')
def fixed_multiplication(str_to_fix):
"""
Inserts multiplication symbol wherever omitted.
"""
pattern_digit_x = r"(\d)([A-Za-z])" # 4x -> 4*x
pattern_par_digit = r"(\))(\d)" # )4 -> )*4
pattern_digit_par = r"[^a-zA-Z]?_?(\d)(\()" # 4( -> 4*(
for patt in (pattern_digit_x, pattern_par_digit, pattern_digit_par):
str_to_fix = re.sub(patt, r'\1*\2', str_to_fix)
return str_to_fix
processed_str = fixed_power_op(processed_str)
class FProcessing(object):
def __init__(self, func_key, func_body):
self.func_key = func_key
self.func_body = func_body
def sliced_func_name(self):
return re.sub(r'(.+)\(.+', r'\1', self.func_key)
def sliced_func_args(self):
return re.search(r'\((.*)\)', self.func_key).group()
def sliced_args(self):
"""
Returns arguments found for given function. Arguments can be separated by comma or whitespace.
:returns (list)
"""
if ',' in self.sliced_func_args():
arg_separator = ','
else:
arg_separator = ' '
return self.sliced_func_args().replace('(', '').replace(')', '').split(arg_separator)
def num_of_sliced_args(self):
"""
Returns number of arguments found for given function.
"""
return len(self.sliced_args())
def functions_in_function_body(self):
"""
Detects functions in function body.
e.g. f1(x,y): sin(x+y**2), will result in "sin"
:returns (set)
"""
return set(re.findall(r'([a-zA-Z]+_?\w*)\(', self.func_body))
def symbols_in_func_body(self):
"""
Detects non argument symbols in function body.
"""
symbols_in_body = set(re.findall(r'[a-zA-Z]+_\w*', self.func_body))
return symbols_in_body - self.functions_in_function_body()
# --------------------------------------------------------------------------------------
# SYMBOL DETECTION (x, y, z, mz,..)
# Prohibited symbols
prohibited_symbol_names = set()
# Custom function names are prohibited symbol names.
for key in given_functions.keys():
prohibited_symbol_names |= {FProcessing(func_key=key, func_body=None).sliced_func_name()}
def symbols_in_str(provided_str):
"""
Returns a set of symbol names that are contained in provided string.
Allowed symbols start with a letter followed by 0 or more letters,
and then 0 or more numbers (eg. x, x1, Na, Xaa_sd, xa123)
"""
symbol_pattern = re.compile(r'[A-Za-z]+\d*')
symbol_name_set = re.findall(symbol_pattern, provided_str)
# Filters out prohibited.
symbol_name_set = {i for i in symbol_name_set if (i not in prohibited_symbol_names)}
return symbol_name_set
# ----------------------------------------------------------------
# EXEC SYMBOLS
symbols_in_given_str = symbols_in_str(initial_str)
# e.g. " x, y, sd = sympy.symbols('x y sd') "
symbol_string_to_exec = ', '.join(symbols_in_given_str)
symbol_string_to_exec += ' = '
symbol_string_to_exec += "sympy.symbols('%s')" % ' '.join(symbols_in_given_str)
exec symbol_string_to_exec
# -----------------------------------------------------------------------------------------
# FUNCTIONS
# Detects secondary functions (functions contained in body of given_functions dict)
sec_functions = set()
for key, val in given_functions.items():
sec_functions |= FProcessing(func_key=key, func_body=val).functions_in_function_body()
def secondary_function_as_exec_str(func_key):
"""
Used for functions that are contained in the function body of given_functions.
E.g. given_functions = {f1(x): sin(4+x)}
"my_f1 = sympy.Function('sin')(x)"
:param func_key: (str)
:return: (str)
"""
returned_str = "%s = sympy.Function('%s')" % (func_key, func_key)
print returned_str
return returned_str
def given_function_as_sympy_class_as_str(func_key, func_body):
"""
Converts given_function to sympy class and executes it.
E.g. class f1(sympy.Function):
nargs = (1, 2)
#classmethod
def eval(cls, x, y):
return cross(x+y**2)
:param func_key: (str)
:return: (None)
"""
func_proc_instance = FProcessing(func_key=func_key, func_body=func_body)
returned_str = 'class %s(sympy.Function): ' % func_proc_instance.sliced_func_name()
returned_str += '\n\tnargs = %s' % func_proc_instance.num_of_sliced_args()
returned_str += '\n\t#classmethod'
returned_str += '\n\tdef eval(cls, %s):' % ','.join(func_proc_instance.sliced_args())
returned_str = returned_str.replace("'", '')
returned_str += '\n\t\treturn %s' % func_body
returned_str = fixed_power_op(returned_str)
print '\n', returned_str
return returned_str
# Executes functions in given_functions' body
for name in sec_functions:
exec secondary_function_as_exec_str(func_key=name)
# Executes given_functions
for key, val in given_functions.items():
exec given_function_as_sympy_class_as_str(func_key=key, func_body=val)
final_result = eval(initial_str)
# PRINTING
print '\n' + ('-'*40)
print '\nRESULTS'
print '\nInitial string: \n%s' % initial_str
print '\nGiven functions:'
for key, val in given_functions.iteritems():
print '%s: ' % key, val
print '\nResult: \n%s' % final_result
I think you want to use something like PyBison which is a parser generator.
See an example that contains the basic code you need here:
http://freenet.mcnabhosting.com/python/pybison/calc.py
You need to add a token type for functions, and a rule for functions, and then what happens with that function if it is encountered.
If you need other information about parsing and so on, try to read some basic tutorials on Lex and (Yacc or Bison).
I am writing some quiz game and need computer to solve 1 game in the quiz if players fail to solve it.
Given data :
List of 6 numbers to use, for example 4, 8, 6, 2, 15, 50.
Targeted value, where 0 < value < 1000, for example 590.
Available operations are division, addition, multiplication and division.
Parentheses can be used.
Generate mathematical expression which evaluation is equal, or as close as possible, to the target value. For example for numbers given above, expression could be : (6 + 4) * 50 + 15 * (8 - 2) = 590
My algorithm is as follows :
Generate all permutations of all the subsets of the given numbers from (1) above
For each permutation generate all parenthesis and operator combinations
Track the closest value as algorithm runs
I can not think of any smart optimization to the brute-force algorithm above, which will speed it up by the order of magnitude. Also I must optimize for the worst case, because many quiz games will be run simultaneously on the server.
Code written today to solve this problem is (relevant stuff extracted from the project) :
from operator import add, sub, mul, div
import itertools
ops = ['+', '-', '/', '*']
op_map = {'+': add, '-': sub, '/': div, '*': mul}
# iterate over 1 permutation and generates parentheses and operator combinations
def iter_combinations(seq):
if len(seq) == 1:
yield seq[0], str(seq[0])
else:
for i in range(len(seq)):
left, right = seq[:i], seq[i:] # split input list at i`th place
# generate cartesian product
for l, l_str in iter_combinations(left):
for r, r_str in iter_combinations(right):
for op in ops:
if op_map[op] is div and r == 0: # cant divide by zero
continue
else:
yield op_map[op](float(l), r), \
('(' + l_str + op + r_str + ')')
numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None
for i in range(len(numbers)):
for current in itertools.permutations(numbers, i+1): # generate perms
for value, item in iter_combinations(list(current)):
if value < 0:
continue
if abs(target - value) < best_value:
best_value = abs(target - value)
best_item = item
print best_item
It prints : ((((4*6)+50)*8)-2). Tested it a little with different values and it seems to work correctly. Also I have a function to remove unnecessary parenthesis but it is not relevant to the question so it is not posted.
Problem is that this runs very slowly because of all this permutations, combinations and evaluations. On my mac book air it runs for a few minutes for 1 example. I would like to make it run in a few seconds tops on the same machine, because many quiz game instances will be run at the same time on the server. So the questions are :
Can I speed up current algorithm somehow (by orders of magnitude)?
Am I missing on some other algorithm for this problem which would run much faster?
You can build all the possible expression trees with the given numbers and evalate them. You don't need to keep them all in memory, just print them when the target number is found:
First we need a class to hold the expression. It is better to design it to be immutable, so its value can be precomputed. Something like this:
class Expr:
'''An Expr can be built with two different calls:
-Expr(number) to build a literal expression
-Expr(a, op, b) to build a complex expression.
There a and b will be of type Expr,
and op will be one of ('+','-', '*', '/').
'''
def __init__(self, *args):
if len(args) == 1:
self.left = self.right = self.op = None
self.value = args[0]
else:
self.left = args[0]
self.right = args[2]
self.op = args[1]
if self.op == '+':
self.value = self.left.value + self.right.value
elif self.op == '-':
self.value = self.left.value - self.right.value
elif self.op == '*':
self.value = self.left.value * self.right.value
elif self.op == '/':
self.value = self.left.value // self.right.value
def __str__(self):
'''It can be done smarter not to print redundant parentheses,
but that is out of the scope of this problem.
'''
if self.op:
return "({0}{1}{2})".format(self.left, self.op, self.right)
else:
return "{0}".format(self.value)
Now we can write a recursive function that builds all the possible expression trees with a given set of expressions, and prints the ones that equals our target value. We will use the itertools module, that's always fun.
We can use itertools.combinations() or itertools.permutations(), the difference is in the order. Some of our operations are commutative and some are not, so we can use permutations() and assume we will get many very simmilar solutions. Or we can use combinations() and manually reorder the values when the operation is not commutative.
import itertools
OPS = ('+', '-', '*', '/')
def SearchTrees(current, target):
''' current is the current set of expressions.
target is the target number.
'''
for a,b in itertools.combinations(current, 2):
current.remove(a)
current.remove(b)
for o in OPS:
# This checks whether this operation is commutative
if o == '-' or o == '/':
conmut = ((a,b), (b,a))
else:
conmut = ((a,b),)
for aa, bb in conmut:
# You do not specify what to do with the division.
# I'm assuming that only integer divisions are allowed.
if o == '/' and (bb.value == 0 or aa.value % bb.value != 0):
continue
e = Expr(aa, o, bb)
# If a solution is found, print it
if e.value == target:
print(e.value, '=', e)
current.add(e)
# Recursive call!
SearchTrees(current, target)
# Do not forget to leave the set as it were before
current.remove(e)
# Ditto
current.add(b)
current.add(a)
And then the main call:
NUMBERS = [4, 8, 6, 2, 15, 50]
TARGET = 590
initial = set(map(Expr, NUMBERS))
SearchTrees(initial, TARGET)
And done! With these data I'm getting 719 different solutions in just over 21 seconds! Of course many of them are trivial variations of the same expression.
24 game is 4 numbers to target 24, your game is 6 numbers to target x (0 < x < 1000).
That's much similar.
Here is the quick solution, get all results and print just one in my rMBP in about 1-3s, I think one solution print is ok in this game :), I will explain it later:
def mrange(mask):
#twice faster from Evgeny Kluev
x = 0
while x != mask:
x = (x - mask) & mask
yield x
def f( i ) :
global s
if s[i] :
#get cached group
return s[i]
for x in mrange(i & (i - 1)) :
#when x & i == x
#x is a child group in group i
#i-x is also a child group in group i
fk = fork( f(x), f(i-x) )
s[i] = merge( s[i], fk )
return s[i]
def merge( s1, s2 ) :
if not s1 :
return s2
if not s2 :
return s1
for i in s2 :
#print just one way quickly
s1[i] = s2[i]
#combine all ways, slowly
# if i in s1 :
# s1[i].update(s2[i])
# else :
# s1[i] = s2[i]
return s1
def fork( s1, s2 ) :
d = {}
#fork s1 s2
for i in s1 :
for j in s2 :
if not i + j in d :
d[i + j] = getExp( s1[i], s2[j], "+" )
if not i - j in d :
d[i - j] = getExp( s1[i], s2[j], "-" )
if not j - i in d :
d[j - i] = getExp( s2[j], s1[i], "-" )
if not i * j in d :
d[i * j] = getExp( s1[i], s2[j], "*" )
if j != 0 and not i / j in d :
d[i / j] = getExp( s1[i], s2[j], "/" )
if i != 0 and not j / i in d :
d[j / i] = getExp( s2[j], s1[i], "/" )
return d
def getExp( s1, s2, op ) :
exp = {}
for i in s1 :
for j in s2 :
exp['('+i+op+j+')'] = 1
#just print one way
break
#just print one way
break
return exp
def check( s ) :
num = 0
for i in xrange(target,0,-1):
if i in s :
if i == target :
print numbers, target, "\nFind ", len(s[i]), 'ways'
for exp in s[i]:
print exp, ' = ', i
else :
print numbers, target, "\nFind nearest ", i, 'in', len(s[i]), 'ways'
for exp in s[i]:
print exp, ' = ', i
break
print '\n'
def game( numbers, target ) :
global s
s = [None]*(2**len(numbers))
for i in xrange(0,len(numbers)) :
numbers[i] = float(numbers[i])
n = len(numbers)
for i in xrange(0,n) :
s[2**i] = { numbers[i]: {str(numbers[i]):1} }
for i in xrange(1,2**n) :
#we will get the f(numbers) in s[2**n-1]
s[i] = f(i)
check(s[2**n-1])
numbers = [4, 8, 6, 2, 2, 5]
s = [None]*(2**len(numbers))
target = 590
game( numbers, target )
numbers = [1,2,3,4,5,6]
target = 590
game( numbers, target )
Assume A is your 6 numbers list.
We define f(A) is all result that can calculate by all A numbers, if we search f(A), we will find if target is in it and get answer or the closest answer.
We can split A to two real child groups: A1 and A-A1 (A1 is not empty and not equal A) , which cut the problem from f(A) to f(A1) and f(A-A1). Because we know f(A) = Union( a+b, a-b, b-a, a*b, a/b(b!=0), b/a(a!=0) ), which a in A, b in A-A1.
We use fork f(A) = Union( fork(A1,A-A1) ) stands for such process. We can remove all duplicate value in fork(), so we can cut the range and make program faster.
So, if A = [1,2,3,4,5,6], then f(A) = fork( f([1]),f([2,3,4,5,6]) ) U ... U fork( f([1,2,3]), f([4,5,6]) ) U ... U stands for Union.
We will see f([2,3,4,5,6]) = fork( f([2,3]), f([4,5,6]) ) U ... , f([3,4,5,6]) = fork( f([3]), f([4,5,6]) ) U ..., the f([4,5,6]) used in both.
So if we can cache every f([...]) the program can be faster.
We can get 2^len(A) - 2 (A1,A-A1) in A. We can use binary to stands for that.
For example: A = [1,2,3,4,5,6], A1 = [1,2,3], then binary 000111(7) stands for A1. A2 = [1,3,5], binary 010101(21) stands for A2. A3 = [1], then binary 000001(1) stands for A3...
So we get a way stands for all groups in A, we can cache them and make all process faster!
All combinations for six number, four operations and parenthesis are up to 5 * 9! at least. So I think you should use some AI algorithm. Using genetic programming or optimization seems to be the path to follow.
In the book Programming Collective Intelligence in the chapter 11 Evolving Intelligence you will find exactly what you want and much more. That chapter explains how to find a mathematical function combining operations and numbers (as you want) to match a result. You will be surprised how easy is such task.
PD: The examples are written using Python.
I would try using an AST at least it will
make your expression generation part easier
(no need to mess with brackets).
http://en.wikipedia.org/wiki/Abstract_syntax_tree
1) Generate some tree with N nodes
(N = the count of numbers you have).
I've read before how many of those you
have, their size is serious as N grows.
By serious I mean more than polynomial to say the least.
2) Now just start changing the operations
in the non-leaf nodes and keep evaluating
the result.
But this is again backtracking and too much degree of freedom.
This is a computationally complex task you're posing. I believe if you
ask the question as you did: "let's generate a number K on the output
such that |K-V| is minimal" (here V is the pre-defined desired result,
i.e. 590 in your example) , then I guess this problem is even NP-complete.
Somebody please correct me if my intuition is lying to me.
So I think even the generation of all possible ASTs (assuming only 1 operation
is allowed) is NP complete as their count is not polynomial. Not to talk that more
than 1 operation is allowed here and not to talk of the minimal difference requirement (between result and desired result).
1. Fast entirely online algorithm
The idea is to search not for a single expression for target value,
but for an equation where target value is included in one part of the equation and
both parts have almost equal number of operations (2 and 3).
Since each part of the equation is relatively small, it does not take much time to
generate all possible expressions for given input values.
After both parts of equation are generated it is possible to scan a pair of sorted arrays
containing values of these expressions and find a pair of equal (or at least best matching)
values in them. After two matching values are found we could get corresponding expressions and
join them into a single expression (in other words, solve the equation).
To join two expression trees together we could descend from the root of one tree
to "target" leaf, for each node on this path invert corresponding operation
('*' to '/', '/' to '*' or '/', '+' to '-', '-' to '+' or '-'), and move "inverted"
root node to other tree (also as root node).
This algorithm is faster and easier to implement when all operations are invertible.
So it is best to use with floating point division (as in my implementation) or with
rational division. Truncating integer division is most difficult case because it produces same result for different inputs (42/25=1 and 25/25 is also 1). With zero-remainder integer division this algorithm gives result almost instantly when exact result is available, but needs some modifications to work correctly when approximate result is needed.
See implementation on Ideone.
2. Even faster approach with off-line pre-processing
As noticed by #WolframH, there are not so many possible input number combinations.
Only 3*3*(49+4-1) = 4455 if repetitions are possible.
Or 3*3*(49) = 1134 without duplicates. Which allows us to pre-process
all possible inputs off-line, store results in compact form, and when some particular result
is needed quickly unpack one of pre-processed values.
Pre-processing program should take array of 6 numbers and generate values for all possible
expressions. Then it should drop out-of-range values and find nearest result for all cases
where there is no exact match. All this could be performed by algorithm proposed by #Tim.
His code needs minimal modifications to do it. Also it is the fastest alternative (yet).
Since pre-processing is offline, we could use something better than interpreted Python.
One alternative is PyPy, other one is to use some fast interpreted language. Pre-processing
all possible inputs should not take more than several minutes.
Speaking about memory needed to store all pre-processed values, the only problem are the
resulting expressions. If stored in string form they will take up to 4455*999*30 bytes or 120Mb.
But each expression could be compressed. It may be represented in postfix notation like this:
arg1 arg2 + arg3 arg4 + *. To store this we need 10 bits to store all arguments' permutations,
10 bits to store 5 operations, and 8 bits to specify how arguments and operations are
interleaved (6 arguments + 5 operations - 3 pre-defined positions: first two are always
arguments, last one is always operation). 28 bits per tree or 4 bytes, which means it is only
20Mb for entire data set with duplicates or 5Mb without them.
3. Slow entirely online algorithm
There are some ways to speed up algorithm in OP:
Greatest speed improvement may be achieved if we avoid trying each commutative operation twice and make recursion tree less branchy.
Some optimization is possible by removing all branches where the result of division operation is zero.
Memorization (dynamic programming) cannot give significant speed boost here, still it may be useful.
After enhancing OP's approach with these ideas, approximately 30x speedup is achieved:
from itertools import combinations
numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None
subsets = {}
def get_best(value, item):
global best_value, target, best_item
if value >= 0 and abs(target - value) < best_value:
best_value = abs(target - value)
best_item = item
return value, item
def compare_one(value, op, left, right):
item = ('(' + left + op + right + ')')
return get_best(value, item)
def apply_one(left, right):
yield compare_one(left[0] + right[0], '+', left[1], right[1])
yield compare_one(left[0] * right[0], '*', left[1], right[1])
yield compare_one(left[0] - right[0], '-', left[1], right[1])
yield compare_one(right[0] - left[0], '-', right[1], left[1])
if right[0] != 0 and left[0] >= right[0]:
yield compare_one(left[0] / right[0], '/', left[1], right[1])
if left[0] != 0 and right[0] >= left[0]:
yield compare_one(right[0] / left[0], '/', right[1], left[1])
def memorize(seq):
fs = frozenset(seq)
if fs in subsets:
for x in subsets[fs].items():
yield x
else:
subsets[fs] = {}
for value, item in try_all(seq):
subsets[fs][value] = item
yield value, item
def apply_all(left, right):
for l in memorize(left):
for r in memorize(right):
for x in apply_one(l, r):
yield x;
def try_all(seq):
if len(seq) == 1:
yield get_best(numbers[seq[0]], str(numbers[seq[0]]))
for length in range(1, len(seq)):
for x in combinations(seq[1:], length):
for value, item in apply_all(list(x), list(set(seq) - set(x))):
yield value, item
for x, y in try_all([0, 1, 2, 3, 4, 5]): pass
print best_item
More speed improvements are possible if you add some constraints to the problem:
If integer division is only possible when the remainder is zero.
If all intermediate results are to be non-negative and/or below 1000.
Well I don't will give up. Following the line of all the answers to your question I come up with another algorithm. This algorithm gives the solution with a time average of 3 milliseconds.
#! -*- coding: utf-8 -*-
import copy
numbers = [4, 8, 6, 2, 15, 50]
target = 590
operations = {
'+': lambda x, y: x + y,
'-': lambda x, y: x - y,
'*': lambda x, y: x * y,
'/': lambda x, y: y == 0 and 1e30 or x / y # Handle zero division
}
def chain_op(target, numbers, result=None, expression=""):
if len(numbers) == 0:
return (expression, result)
else:
for choosen_number in numbers:
remaining_numbers = copy.copy(numbers)
remaining_numbers.remove(choosen_number)
if result is None:
return chain_op(target, remaining_numbers, choosen_number, str(choosen_number))
else:
incomming_results = []
for key, op in operations.items():
new_result = op(result, choosen_number)
new_expression = "%s%s%d" % (expression, key, choosen_number)
incomming_results.append(chain_op(target, remaining_numbers, new_result, new_expression))
diff = 1e30
selected = None
for exp_result in incomming_results:
exp, res = exp_result
if abs(res - target) < diff:
diff = abs(res - target)
selected = exp_result
if diff == 0:
break
return selected
if __name__ == '__main__':
print chain_op(target, numbers)
Erratum: This algorithm do not include the solutions containing parenthesis. It always hits the target or the closest result, my bad. Still is pretty fast. It can be adapted to support parenthesis without much work.
Actually there are two things that you can do to speed up the time to milliseconds.
You are trying to find a solution for given quiz, by generating the numbers and the target number. Instead you can generate the solution and just remove the operations. You can build some thing smart that will generate several quizzes and choose the most interesting one, how ever in this case you loose the as close as possible option.
Another way to go, is pre-calculation. Solve 100 quizes, use them as build-in in your application, and generate new one on the fly, try to keep your quiz stack at 100, also try to give the user only the new quizes. I had the same problem in my bible games, and I used this method to speed thing up. Instead of 10 sec for question it takes me milliseconds as I am generating new question in background and always keeping my stack to 100.
What about Dynamic programming, because you need same results to calculate other options?