I searched for an answer but I couldn't find a clear one. Please, bear with me as I'm kind of a noob in regex, and this is my first question too. I'm using Python 3, but will also be needing this for Javascript too.
What I'm trying to do is validate an input by the user. The input is an inequality (spaces removed), and the variables are named by the user and given beforehand.
For example, let's say I have this inequality:
x+y+6p<=z+1
The variables x, y, p, z will be given. The problem now if the inequality is like this:
xp+yp+6p<=z+1
The given variables are xp, yp, p, and z.
I'm trying to write a regular expression to match any inequality with such a format, given no spaces in the inequality. I cannot figure out how to check for alternative strings. For example I wrote the following expression:
^([\+\-]?[0-9]*([xpypz]|[0-9]+))+[<>]=([\+\-]?[0-9]*([xpypz]|[0-9]+))+$
I know this is completely wrong and that's not how the parentheses are used, but I don't have a feasible expression and I wanted to show you what I want to achieve. Now I need to know three things (at least, I hope) to fix it:
How to check specifically for xp, and yp as they are literally instead of all characters in the set xypz?
How to make 0-9 after xpypz work as [0-9]+? Meaning that any number can occur instead of a variable?
How can I repeat make the whole group repeated
I'm trying to write this expression to check if the user is adding undeclared variables. I believe this can be done differently without using regex, but it would be nice to do it in a single line. Can you please help me figure out those three point? Thanks.
try this pattern
(^(?=.)(?:(?:[+-]?\d*(?:xp|yp|p|z)*)+)[<>]=(?=.)(?:(?:[+-]?\d*(?:xp|yp|p|z)*)+)$)
Demo
[0-9]*(xp|yp|p|z)*([+-][0-9]*(xp|yp|p|z)*)*(<|>|<=|>=)[0-9]*(xp|yp|p|z)*([+-][0-9]*(xp|yp|p|z)*)*
This is ugly and won't catch mistakes like 1++x<p nor does it allow for other functions like sin or exponents. It matches on xp+yp+6p<=z+1 but does not on xp+yp+6x<=z+1 if xp, yp, p, and z are the variables given.
As Greg Ball mentioned, though, the best thing would be to use parsing if possible. Then you could catch more syntax errors besides using wring variables and you could do so more reliably.
Related
Is there a way to check satisfiability of a python string like 'p or p -> p' in Z3 if you do not know the variable names before hand?
For example I have seen this:
p = Bool('p')
solve(Implies(Or(p, p), p))
However I cannot define the variables in Z3 in advance because the proposition is given to me as a string. How can I do this with z3?
I have also seen python's eval function but it seems I need to have the variable names defined in z3 of that prior too
Some questions to ponder: What would be the meaning of that string? What if it has syntax-errors in it? How do you discern what are the valid operators/variables? Do you allow just booleans, or other sorts as well? What about grouping, precedence, and associativity of operators?
Bottom line, if you want to go directly from a string, you really have no choice but to agree on a syntax and a semantics of what those strings mean. And the only way to do that is to write a parser for those strings, and "interpret" that result in the z3 context.
One choice is to "stick" to SMTLib, i.e., ask your input to be well-formatted SMTLib scripts. If you go with this choice, then z3 already have a built-in parser for them that you can readily use. See here: https://z3prover.github.io/api/html/namespacez3py.html#a09fe122cbfbc6d3fa30a79850b2a2414 But I'm pretty sure you'll find this rather ugly and not quite what you wanted. But this is the only "out-of-the-box" solution.
The proper way to handle this issue is to write a basic parser over boolean-expressions, whose syntax (and to some extent semantics) you'll have freedom to define however way you want. Also, this isn't a particularly difficult thing to do. If you're doing this in Python, you can use ply (https://www.dabeaz.com/ply/), or go with a hand-written recursive-descent parser (https://www.booleanworld.com/building-recursive-descent-parsers-definitive-guide/).
Feel free to explore and ask further questions; though make sure to tag them appropriately if it's about parsing strings in Python; which really have nothing to do with z3/z3py.
So what I am trying to do is write a script that lets me input some function and a list of the variables inside it, then processes it into some other formular, computes a result, and then outputs both the new formular and the result as Latex code. Everything works fine as long as I only input variables which do not contain "^", "{", or "}". The problem is, I want to use, or, at the very least, output the names exactly as they are written in my Latex document, and as such they do often contain these characters.
I am aware that there is a built-in Latex-Parser in Sympy, but as I understood it requires some other package (antlr4), and I would like to try to avoid that, since I am planning to distribute the script to my fellow students, and don't want to add another requirement for running the script.
So what I thought of is that I could use the list of variable names (which I input anyway together with their values to allow the program to compute a final result): I tried to define a "transformation", as it is described on the Sympy documentation on parsing. It looks like this:
#Defining the transformation
def can_split(symbol):
#Check if symbol is in one of the lists of inputted values (the two lists contain tuples of variable names[0] and their corresponding values[1])
if symbol not in ([i[0] for i in uncertainValues]+[i[0] for i in certainValues]):
return _token_splittable(symbol)
return False
#Read function definition from TKinter text field, split only by custom defined symbols
function=parse_expr(functionEntry.get("1.0", "end-1c"),transformations = (split_symbols_custom(can_split)))
The problem is that if I run this script, and input e. g. "a^b*c", and the variable names "a^b" and "c", which should normally be read as "the variable 'a^b' multiplied with the variable 'c'"I get the exception: "NameError: name 'a' is not defined".
If anyone could help me with this, or maybe propose another way to do this properly, I would be very thankful. Also, if there is more code or context needed to find a better solution, I'll provide more - I just felt everything would get too long-winding if I explained the whole idea. But as I said, I'll be glad to do that if it helps.
Quick but dirty workaround:
For now I ended up using the dirty method of replacing all problematic characters with unique strings at input, and replacing them with their symbols again before outputting.
I'm hoping to match the beginning of a string differently based on whether a certain block of characters is present later in the string. A very simplified version of this is:
re.search("""^(?(pie)a|b)c.*(?P<pie>asda)$""", 'acaaasda')
Where, if <pie> is matched, I want to see a at the beginning of the string, and if it isn't then I'd rather see b.
I'd use normal numerical lookahead but there's no guarantee how many groups will or won't be matched between these two.
I'm currently getting error: unknown group name. The sinking feeling in my gut tells me that this is because what I want is impossible (look-ahead to named groups isn't exactly a feature of a regular language parser), but I really really really want this to work -- the alternative is scrapping 4 or 5 hours' worth of regex writing and redoing it all tomorrow as a recursive descent parser or something.
Thanks in advance for any help.
Unfortunately, I don't think there is a way to do what you want to do with named groups. If you don't mind duplication too much, you could duplicate the shared conditions and OR the expressions together:
^(ac.*asda|bc.*)$
If it is a complicated expression you could always use string formatting to share it (rather than copy-pasting the shared part):
common_regex = "c.*"
final_regex = "^(a{common}asda|b{common})$".format(common=common_regex)
You can use something like that:
^(?:a(?=c.*(?P<pie>asda)$)|b)c.*$
or without .*$ if you don't need it.
I'm trying to create a calculator program in which the user can type an equation and get an answer. I don't want the full code for this, I just need help with a specific part.
The approach I am trying to take is to have the user input the equation as a string (raw_input) and then I am trying to convert the numbers from their input to integers. After that I need to know how I can get the operands to do what I want them to do depending on which operand the user uses and where it is in the equation.
What are some methods I might use to accomplish this task?
Here is basically what I have right now:
equation_number = raw_input("\nEnter your equation now: ")
[int(d) for d in equation_number if d.isdigit()]
Those lines are just for collecting input and attempting to convert the numbers into integers. Unfortunately, it does not seem to be working very well and .isdigit will only work for positive numbers anyway.
Edit- aong152 mentioned recursive parsing, which I looked into, and it appears to have desirable results:
http://blog.erezsh.com/how-to-write-a-calculator-in-70-python-lines-by-writing-a-recursive-descent-parser/
However, I do not understand the code that the author of this post is using, could anyone familiarize me with the basics of recursive parsing?
The type of program you are trying to make is probably more complicated than you think
The first step would be separating the string into each argument.
Let's say that the user inputs:
1+2.0+3+4
Before you can even convert to ints, you are going to need to split the string up into its components:
1
+
2.0
+
3
+
4
This will require a recursive parser, which (seeing as you are new to python) maybe be a bit of a hurdle.
Assuming that you now have each part seperately as strings,
float("2.0") = 2.0
int(2.0) = 2
Here is a helper function
def num (s):
try:
return int(s)
except exceptions.ValueError:
return int(float(s))
instead of raw_input just use input because raw_input returns a string and input returns ints
This is a very simple calculator:
def calculate():
x = input("Equation: ")
print x
while True:
calculate()
the function takes the input and prints it then the while loop executes it again
im not sure if this is what you want but here you go and also you should make a way to end the loop
After using raw_input() you can use eval() on the result to compute the value of this string. eval() evaluates any valid Python expression and returns the outcome.
But I think this is not to your liking. You probably want to do more by yourself.
So I think you should have a look at the re module to split the input using regular expressions into tokens (sth like numbers and operators). After this you should write a parser which gets the token stream as input. You should decide whether this parser shall just return the computed value (e. g. a number) or maybe an abstract syntax tree, i. e. a data structure which represents the expression in an object-oriented (instead of character-oriented) way. Such an Absy could then be evaluated to get the final result.
Are you familiar with regular expressions? If not, it's probably a good idea to first learn about them. They are the weak, non-recursive cousin of parsing. Don't go deep, just understand the building blocks — A then B, A many times, A or B.
The blog post you found is hard because it implements the parsing by hand. It's using recursive descent, which is the only way to write a parser by hand and keep your sanity, but it's still tricky.
What people do most of the time is only write a high level grammar and use a library (or code generator) to do the hard work of parsing.
Indeed he had an earlier post where he uses a library:
http://blog.erezsh.com/how-to-write-a-calculator-in-50-python-lines-without-eval/
At least the beginning should be very easy. Things to pay attention to:
How precedence arises from the structure of the grammar — add consists of muls, not vice versa.
The moment he adds a rule for parentheses:
atom: neg | number | '(' add ')';
This is where it really becomes recursive!
6-2-1 should parse as (6-2)-1, not 6-(2-1). He doesn't discuss it, but if you look
carefully, it also arises from the structure of the grammar. Don't waste tome on this; just know for future reference that this is called associativity.
The result of parsing is a tree. You can then compute its value in a bottom-up manner.
In the "Calculating!" chapter he does that, but the in a sort of magic way.
Don't worry about that.
To build a calculator yourself, I suggest you strip the problem as much as possible.
Recognizing where numbers end etc. is a bit messy. It could be part of the grammar, or done by a separate pass called lexer or tokenizer.
I suggest you skip it — require the user to type spaces around all operators and parens. Or just assume you're already given a list of the form [2.0, "*", "(", 3.0, "+", -1.0, ")"].
Start with a trivial parser(tokens) function that only handles 3-element expressions — [number, op, number].
Return a single number, the result of the computation. (I previously said parsers output a tree which is processed later. Don't worry about that, returning a number is simpler.)
Write a function that expects either a number or parentheses — in the later case it calls parser().
>>> number_or_expr([1.0, "rest..."])
(1.0, ["rest..."])
>>> number_or_expr(["(", 2.0, "+", 2.0, ")", "rest..."])
(4.0, ["rest..."])
Note that I'm now returning a second value - the remaining part of the input. Change parser() to also use this convention.
Now Rewrite parser() to call number_or_expr() instead of directly assuming tokens[0] and tokens[2] are numbers.
Viola! You now have a (mutually) recursive calculator that can compute anything — it just has to be written in verbose style with parens around everything.
Now stop and admire your code, for at least a day :-) It's still simple but has the essential recursive nature of parsing. And the code structure reflects the grammar 1:1 (which is the nice property of recursive descent. You don't want to know how the other algorithms look).
From here there many improvements possible — support 2+2+2, allow (1), precedence... — but there are 2 ways to go about it:
Improve your code step by step. You'll have to refactor a lot.
Stop working hard and use a parsing library, e.g. pyparsing.
This will allow you to experiment with grammar changes faster.
I am converting some matlab code to C, currently I have some lines that have powers using the ^, which is rather easy to do with something along the lines \(?(\w*)\)?\^\(?(\w*)\)?
works fine for converting (glambda)^(galpha),using the sub routine in python pattern.sub(pow(\g<1>,\g<2>),'(glambda)^(galpha)')
My problem comes with nested parenthesis
So I have a string like:
glambdastar^(1-(1-gphi)*galpha)*(glambdaq)^(-(1-gphi)*galpha);
And I can not figure out how to convert that line to:
pow(glambdastar,(1-(1-gphi)*galpha))*pow(glambdaq,-(1-gphi)*galpha));
Unfortunately, regular expressions aren't the right tool for handling nested structures. There are some regular expressions engines (such as .NET) which have some support for recursion, but most — including the Python engine — do not, and can only handle as many levels of nesting as you build into the expression (which gets ugly fast).
What you really need for this is a simple parser. For example, iterate over the string counting parentheses and storing their locations in a list. When you find a ^ character, put the most recently closed parenthesis group into a "left" variable, then watch the group formed by the next opening parenthesis. When it closes, use it as the "right" value and print the pow(left, right) expression.
I think you can use recursion here.
Once you figure out the Left and Right parts, pass each of those to your function again.
The base case would be that no ^ operator is found, so you will not need to add the pow() function to your result string.
The function will return a string with all the correct pow()'s in place.
I'll come up with an example of this if you want.
Nested parenthesis cannot be described by a regexp and require a full parser (able to understand a grammar, which is something more powerful than a regexp). I do not think there is a solution.
See recent discussion function-parser-with-regex-in-python (one of many similar discussions). Then follow the suggestion to pyparsing.
An alternative would be to iterate until all ^ have been exhausted. no?.
Ruby code:
# assuming str contains the string of data with the expressions you wish to convert
while str.include?('^')
str!.gsub!(/(\w+)\^(\w+)/, 'pow(\1,\2)')
end