I'm studying over parsing with python. I have user-defined instructions. So I have to specify precedence of them. I find an example here is the link
I don't understand what they do in here
precedence = (
('left','PLUS','MINUS'),
('left','TIMES','DIVIDE'),
('right','UMINUS'),
)
how python prioritizes them?
and also those too
def p_statement_assign(t):
'statement : NAME EQUALS expression'
names[t[1]] = t[3]
def p_statement_expr(t):
'statement : expression'
print(t[1])
What does it mean to write 'statement : expression' in quotation marks? How python understand and make sense of them?
I'm adding my instruction too. I will use them for drawing something in my program
F n -> go on n step
R n -> turn right n degree
L n -> Repeat the parentheses n times
COLOR f -> f: line color
PEN n -> line thickness
These instructions are read by ply and any Python function/class/module can have these strings write at the beginning of them called docstring and you can use __doc__ attribute to retrieve them. Ply cleverly uses them as annotations to define the parsing rules. The rule can be interpreted as such: statement: NAME EQUALS expression means if there is a token stream that matches the sequence first with NAME, then EQUALS sign and finally an expression, it will be reduced to a statement.
The same is for precedence variable, which is also read by ply and ply uses this variable to define precedence rule.
I recommend you read the ply documentation before using it as you need to know the basics about tokenizing and parsing before you can use a compiler construction tool like ply.
Related
Wanted to know why I it shows an error when I try this:
print(f"equal to: {lambda h, c : sqrt(h**2 + c**2)}") # error
and it worked when I tried this:
print("equal to", lambda h, c : sqrt(h**2 + c**2)) # doesn't show error
Why is there an error on the first one? Is it a bug or not? Is it not possible to use lambda expressions within a f-string?
extra details: I was using Visual Studio Code and it showed: Expected ":" Pylance
From the documentation:
Expressions in formatted string literals are treated like regular Python expressions surrounded by parentheses, with a few exceptions. An empty expression is not allowed, and both lambda and assignment expressions := must be surrounded by explicit parentheses. [...]
>>> print(f"equal to: {(lambda h, c : sqrt(h**2 + c**2))}")
equal to: <function <lambda> at 0x10228af70>
As to why, I suspect the amount of proper parsing that can be done during lexical analysis is limited, and so the parentheses are needed to help the parser.
(The above link includes a snippet of the grammar used to define f-strings, which hints at the differences between replacement fields and arbitrary Python expressions, but I will beg off trying to provide an explanation for them.)
I'm trying to learn how to deobfuscate some code that is unneccesarily complicated. For example, I would like to be able to rewrite this line of code:
return ('d' + chr(101) + chr(97) + chr(200 - 100)) # returns 'dead'
to:
return 'dead'
So basically, I need to evaluate all literals within the py file, including complicated expressions that evaluate to simple integers. How do I go about writing this reader / is there something that exists that can do this? Thanks!
What you want is a program transformation system (PTS).
This is a tool for parsing source code to an AST, transforming the tree, and then regenerating valid source code from the tree. See my SO answer on rewriting Python text for some background.
With a PTS like (my company's) DMS Software Reengineering Toolkiit, you can write rules to do constant folding, which means essentially doing compile-time arithmetic.
For the example you show, the following rules can accomplish OP's example:
rule fold_subtract_naturals(n:NATURAL,m:NATURAL): sum->sum =
" \n + \m " -> " \subtract_naturals\(\n\,\m\) ";
rule convert_chr_to_string(c:NATURAL): term->term =
" chr(\c) " -> make_string_from_natural(c) ;
rule convert_character_literal_to_string(c:CHARACTER): term->term =
" \c " -> make_string_from_character(c) ;
rule fold_concatenate_strings(s1:STRING, s2:STRING): sum->sum =
" \s1 + \s2 " -> " \concatenate_strings\(\s1\,\s2\) ";
ruleset fold_strings = {
fold_subtract_naturals,
convert_chr_to_string,
convert_characater_to_string,
fold_concatenate_strings };
Each of the individual rules matches corresponding syntax/trees. They are written in such a way that they only apply to literal constants.
fold_add_naturals finds pairs of NATURAL constants joined by an add operation, and replaces that by the sum using a built-in function that sums two values and produces a literal value node containing the sum.
convert_chr_to_string converts chr(c) to the corresponding string literal.
convert_character_to_string converts 'C' to the corresponding string "C".
fold_concatenate_strings combines two literal strings separated by an add operator. It works analogously to the way that fold_add_naturals works.
subtract_naturals and concatenate_strings are built into DMS. convert_chr_to_string and convert_character_to_string need to be custom-coded in DMS's metaprogramming language, PARLANSE, but these routines are pretty simple (maybe 10 lines).
The ruleset packages up the set of rules so they can all be applied.
Not shown is the basic code to open a file, call the parser, invoke the ruleset transformer (which applies rules until no rule applies). The last step is to call the prettyprinter to reprint the modified AST.
Many other PTS offer similar facilities.
I'm developing a translator for translating simple script on PC to some bytecode to execute it (the bytecode) on a microcontroller.
I've developed the translator in C++ using lex and re2c but Im considering switching to pyparsing.
In order to translate a statement of my script to few operations in bytecode I need to get the Abstract Syntax Tree of that statement.
I.E. this script:
X = 1 - 2;
Should be translated to binary equivalent of this:
register1 <- 1
register2 <- 2
register3 <- register1 - register2
x <- register3
I've got this python code:
integer = Combine( number )
ident = Word(alphas,alphanums)
expr = Forward()
atom = ( integer |
( lpar + expr.suppress() + rpar )
)
expr << ( atom + (addop | multop) + atom )
statement = ident + assign + expr
L = statement..parseString( line )
Is there an example for visiting leafs of AST in L? Or something similar to that...
Thanks in advance
Your current parser will just give you a flat list of parsed tokens, since that is the default in pyparsing. The purpose is so that, regardless of how you build up your parser, whether in smaller pieces and then put them all together, or just in one giant statement, the tokens you get from parsing are structured (or not structured) the same. To get something akin to an AST, you need to define where you want structure using pyparsing's Group class (and I recommend using results names as well). So for example if you change statement to:
statement = Group(ident("lhs") + '=' + Group(expr)("rhs"))
Then your output will be much more predictable - every parsed statement will have 3 top-level elements - the target identifier (addressable as result.lhs), the '=' operator, and the source expression (addressable as result.rhs). The source expression may have further structure to it, but overall there will always be these 3 at the top-most level in every statement.
To ensure the parenthetical groups in your RHS expression are retained when evaluating your expr, again, use a Group:
atom = (integer | Group(lpar + expr + rpar))
You can navigate the hierarchical structure of the parsed results as if you were walking a list of nested lists.
But I would also encourage you to look at the SimpleBool example on the pyparsing wiki. In this example, the various parsed expressions get rendered into instances of classes which then can be processed using a visitor or just an iterator, and each class can then implement its own special logic for emitting your bytecode. Imagine that you had written a typical parser to generate an AST, then walked the AST to create CodeGenerator objects which subclass into AssignmentCodeGenerator or IfCodeGenerator or PrintCodeGenerator classes, and then walked this structure to create your bytecode. Instead, you can define assignment, if-then-else, or print statement expressions in pyparsing, have pyparsing create the classes directly, and then walk the classes to create the bytecode. In the end, your code is neatly organized into different statement types, and each type encapsulates the type of bytecode that it should output.
I am new to Python and am trying to write a calculator program. I have been trying to do the following but with no success, so please point me in the right direction:
I would like to input an equation as a user, for example:
f(t) = 2x^5 + 8
the program should recognize the different parts of a string and in this case make a variable f(t) and assign 2x^5 + 8 to it.
Though, if I input an equation followed by an equals sign, for example
2x^5 + 8 =
the program will instead just output the answer.
I am not asking how to code for the math-logic of solving the equation, just how to get the program to recognize the different parts of a string and make decisions accordingly.
I am sorry I don't have any code to show as an attempt as I'm not sure how to go about this and am looking for a bit of help to get started.
Thank you.
For a little bit of context: The problem you're describing is more generally known as parsing, and it can get rather complicated, depending on the grammar. The grammar is the description of the language; the language, in your case, is the set of all valid formulas for your calculator.
The first recommended step, even before you start coding, is to formalize your grammar. This is mainly for your own benefit, as it will make the programming easier. A well established way to do this is to describe the grammar using EBNF, and there exist tools like PLY for Python that you can use to generate parsers for such languages.
Let's try a simplified version of your calculator grammar:
digit := "0" | "1" # our numbers are in binary
number := digit | number digit # these numbers are all nonnegative
variable := "x" | "y" # we recognize two variable names
operator := "+" | "-" # we could have more operators
expression := number | variable | "(" expression operator expression ")"
definition := variable "=" expression
evaluation := expression "="
Note that there are multiple problems with this grammar. For example:
What about whitespace?
What about negative numbers?
What do you do about inputs like x = x (this is a valid definition)?
The first two are probably problems with the grammar itself, while the last one might need to be handled at a later stage (is the language perhaps context sensitive?).
But anyway, given such a grammar a tool like PLY can generate a parser for you, but leaving it up to you to handle any additional logic (like x = x). First, however, I'd suggest you try to implement it on your own. One idea is to write a so called Top Down Parser using recursion.
I am converting some matlab code to C, currently I have some lines that have powers using the ^, which is rather easy to do with something along the lines \(?(\w*)\)?\^\(?(\w*)\)?
works fine for converting (glambda)^(galpha),using the sub routine in python pattern.sub(pow(\g<1>,\g<2>),'(glambda)^(galpha)')
My problem comes with nested parenthesis
So I have a string like:
glambdastar^(1-(1-gphi)*galpha)*(glambdaq)^(-(1-gphi)*galpha);
And I can not figure out how to convert that line to:
pow(glambdastar,(1-(1-gphi)*galpha))*pow(glambdaq,-(1-gphi)*galpha));
Unfortunately, regular expressions aren't the right tool for handling nested structures. There are some regular expressions engines (such as .NET) which have some support for recursion, but most — including the Python engine — do not, and can only handle as many levels of nesting as you build into the expression (which gets ugly fast).
What you really need for this is a simple parser. For example, iterate over the string counting parentheses and storing their locations in a list. When you find a ^ character, put the most recently closed parenthesis group into a "left" variable, then watch the group formed by the next opening parenthesis. When it closes, use it as the "right" value and print the pow(left, right) expression.
I think you can use recursion here.
Once you figure out the Left and Right parts, pass each of those to your function again.
The base case would be that no ^ operator is found, so you will not need to add the pow() function to your result string.
The function will return a string with all the correct pow()'s in place.
I'll come up with an example of this if you want.
Nested parenthesis cannot be described by a regexp and require a full parser (able to understand a grammar, which is something more powerful than a regexp). I do not think there is a solution.
See recent discussion function-parser-with-regex-in-python (one of many similar discussions). Then follow the suggestion to pyparsing.
An alternative would be to iterate until all ^ have been exhausted. no?.
Ruby code:
# assuming str contains the string of data with the expressions you wish to convert
while str.include?('^')
str!.gsub!(/(\w+)\^(\w+)/, 'pow(\1,\2)')
end