I'm new to Python and pyparsing, and I'm making a logic expression evaluator.
The formula must be a WFF. The BNF of WFF is:
<alpha set> ::= p | q | r | s | t | u | ...
(the arbitrary finite set of propositional variables)
<form> ::= <alpha set> | ¬<form> | (<form>V<form>) | (<form>^<form>)
| (<form> -> <form>) | (<form> <-> <form>)
My code is:
'''
Created on 17/02/2012
#author: Juanjo
'''
from pyparsing import *
from string import lowercase
def fbf():
atom = Word(lowercase, max=1) #aphabet
op = oneOf('^ V => <=>') #Operators
identOp = oneOf('( [ {')
identCl = oneOf(') ] }')
form = Forward() #Iniciar de manera recursiva
#Grammar:
form << ( (Group(Literal('~') + form)) | ( Group(identOp + form + op + form + identCl) ) | ( Group(identOp + form + identCl) ) | (atom) )
return form
entrada = raw_input("Input please: ") #userinput
print fbf().parseString(entrada)
The problem is when I use these expressions: a^b and aVb.
The parser should return an error, but there's no error; instead it returns a. Actually, any symbol after a will be ignored.
The WFF version of those forms are: (a^b) and (aVb)
Both work correctly. I think the problem is in the atom definition.
What am I doing wrong?
By default parseString will just parse the beginning of the string.
You can force it to parse the entire string by changing the code to:
print fbf().parseString(entrada, parseAll=True)
Alternatively, you can end the grammar with the StringEnd() token - see the documentation under parseString in http://packages.python.org/pyparsing/ for more details.
Related
#-----------------------------------------------------------------------------------
from pprint import pprint
data = '''
.
.
.
#Long log file
-------------------------------------------------------------------------------
Section Name | Budget | Size | Prev Size | Overflow
--------------------------------+-----------+-----------+-----------+----------
.text.resident | 712924 | 794576 | 832688 | YES
.rodata.resident | 77824 | 77560 | 21496 | YES
.data.resident | 28672 | 28660 | 42308 | NO
.bss.resident | 52672 | 1051632 | 1455728 | YES
.
.
.
'''
Output expected:
MEMDICT = {'.text.resident' : {'Budget':'712924', 'Size':'794576', 'Prev Size': '832688' , 'Overflow': 'YES'},
'.rodata.resident' : {'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'YES'},
'.data.resident' :{'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'NO'},
'.bss.resident' :{'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'YES'}}
I am a beginer in python. Please suggest some simple steps
Logic:
Search for a regex pattern and get the headers in a list
pattern = re.compile(r'\sSection Name\s|\sBudget*') # This can be improved,
if(pattern.match(line)):
key_list = (''.join(line.split())).split('|') # Unable to handle space issues, so trimmed and used.
Search for a regex pattern to match .something.resident | \d+ | \d+ | \d+ | **
Need some help and get it in value_list
Making all list into the dict in a loop
mem_info = {} # reset the list
for i in range(0,len(key_list)):
mem_info[key_list[i]] = value_list[i]
MEMDICT[sta_info[0]] = sta_info
The only thing you haven't shown us is what line ends the section. Other than that, this is what you need:
keeper = False
memdict = {}
for line in open(file):
if not keeper:
if 'Section Name' in line:
keeper = True
continue
if '-------------------' in line:
continue
if 'whatever ends the section' in line:
break
parts = line.split()
memdict[parts[0]] = {
'Budget': int(parts[1]),
'Size': int(parts[2]),
'Prev Size': int(parts[3]),
'Overflow': parts[4]
)
I've written the decaf grammar specified in cs143 course.
Here is my code.
import sys
from lark import Lark, Transformer, v_args
decaf_grammar = r"""
start : PROGRAM
PROGRAM : DECL+
DECL : VARIABLEDECL | FUNCTIONDECL | CLASSDECL | INTERFACEDECL
VARIABLEDECL : VARIABLE ";"
VARIABLE : TYPE "ident"
TYPE : "int" | "double" | "bool" | "string" | "ident" | TYPE "[]"
FUNCTIONDECL : ( TYPE "ident" "(" FORMALS ")" STMTBLOCK ) | ( "void" "ident" "(" FORMALS ")" STMTBLOCK )
FORMALS : VARIABLE ("," VARIABLE)*
CLASSDECL : "class" "ident" ["extends" "ident"] ["implements" "ident" ("," "ident")*] "{" FIELD* "}"
FIELD : VARIABLEDECL | FUNCTIONDECL
INTERFACEDECL : "interface" "ident" "{" PROTOTYPE* "}"
PROTOTYPE : (TYPE "ident" "(" FORMALS ")" ";") | ("void" "ident" "(" FORMALS ")" ";")
STMTBLOCK : "{" VARIABLEDECL* STMT* "}"
STMT : ( EXPR? ";") | IFSTMT | WHILESTMT | FORSTMT | BREAKSTMT | RETURNSTMT | RETURNSTMT | PRINTSTMT | STMTBLOCK
IFSTMT : "if" "(" EXPR ")" STMT ["else" STMT]
WHILESTMT : "while" "(" EXPR ")" STMT
FORSTMT : "for" "(" EXPR? ";" EXPR ";" EXPR? ")" STMT
RETURNSTMT : "return" EXPR? ";"
BREAKSTMT : "break" ";"
PRINTSTMT : "print" "(" EXPR ("," EXPR)* ")" ";"
EXPR : (LVALUE "=" EXPR) | CONSTANT | LVALUE | "this" | CALL | "(" EXPR ")" | (EXPR "+" EXPR) | (EXPR "-" EXPR) | (EXPR "*" EXPR) | (EXPR "/" EXPR) | (EXPR "%" EXPR) | ("-" EXPR) | (EXPR "<" EXPR) | (EXPR "<=" EXPR) | (EXPR ">" EXPR) | (EXPR ">=" EXPR) | (EXPR "==" EXPR) | (EXPR "!=" EXPR) | (EXPR "&&" EXPR) | (EXPR "||" EXPR) | ("!" EXPR) | ("ReadInteger" "(" ")") | ("ReadLine" "(" ")") | ("new" "ident") | ("NewArray" "(" EXPR "," TYPE ")")
LVALUE : "ident" | (EXPR "." "ident") | (EXPR "[" EXPR "]")
CALL : ("ident" "(" ACTUALS ")") | (EXPR "." "ident" "(" ACTUALS ")")
ACTUALS : EXPR ("," EXPR)* | ""
CONSTANT : "intConstant" | "doubleConstant" | "boolConstant" | "stringConstant" | "null"
"""
class TreeToJson(Transformer):
#v_args(inline=True)
def string(self, s):
return s[1:-1].replace('\\"', '"')
json_parser = Lark(decaf_grammar, parser='lalr', lexer='standard', transformer=TreeToJson())
parse = json_parser.parse
def test():
test_json = '''
{
}
'''
j = parse(test_json)
print(j)
import json
assert j == json.loads(test_json)
if __name__ == '__main__':
test()
#with open(sys.argv[1]) as f:
#print(parse(f.read()))
It throws
RecursionError: maximum recursion depth exceeded.
I'm using lark for the first time
The problem you have is that you don't feel the difference between lark's rules and terminals. Terminals (they are only should be named in capitals) should match string, not structure of your grammar.
The main terminal's property you must support is that they, unlike rules, are not "recursive". Because of that lark struggle to build your grammar and goes to infinite recursion and stackoverflow.
try using sys.setrecursionlimit(xxxx) where xxxx is max recursion depth you want.
To know more visit docs.python.org/3 .
I am a beginner with pyparsing but have experience with other parsing environments.
On my first small demo project I encountered a strange behavior of parsing actions: Parse action of base token (ident_simple) is called twice for each token of ident_simple.
import io, sys
from pyparsing import *
def pa_ident_simple(s, l, t):
print('ident_simple: ' + str(t))
def pa_ident_combined(s, l, t):
print('ident_combined: ' + str(t))
def make_grammar():
number = Word(nums)
ident_simple = Word( alphas, alphanums + "_" )
ident_simple.setParseAction(pa_ident_simple)
ident_combined = Combine(ident_simple + Literal('.') + ident_simple)
ident_combined.setParseAction(pa_ident_combined)
integer = number
elems = ( ident_combined | ident_simple | integer)
grammar = OneOrMore(elems) + StringEnd()
return grammar
if __name__ == "__main__":
inp_str = "UUU FFF.XXX GGG"
grammar = make_grammar()
print (inp_str, "--->", grammar.parseString( inp_str ))
For 'ident_combined' token it looks good: Parseaction is called once for each sub token 'ident_simple' and once for combined token.
I believe that the combined token is the problem: Parseaction of 'ident_simple' is called only once if 'ident_combined' is removed.
Can anybody give me a hint how to combine tokens correctly?
Thanks for any help
Update: When playing around I took the class "Or" instead of "MatchFirst".
elems = ( ident_combined ^ ident_simple ^ integer)
This showed a better behavior (in my opinion).
Output of original grammar (using "MatchFirst"):
ident_simple: ['UUU']
ident_simple: ['UUU']
ident_simple: ['FFF']
ident_simple: ['XXX']
ident_combined: ['FFF.XXX']
ident_simple: ['GGG']
ident_simple: ['GGG']
UUU FFF.XXX GGG ---> ['UUU', 'FFF.XXX', 'GGG']
Output of modified grammar (using "Or"):
ident_simple: ['UUU']
ident_simple: ['FFF']
ident_simple: ['XXX']
ident_combined: ['FFF.XXX']
ident_simple: ['GGG']
UUU FFF.XXX GGG ---> ['UUU', 'FFF.XXX', 'GGG']
import codecs, os
import re
import string
import mysql
import mysql.connector
y_ = ""
'''Searching and reading text files from a folder.'''
for root, dirs, files in os.walk("/Users/ultaman/Documents/PAN dataset/Pan Plagiarism dataset 2010/pan-plagiarism-corpus-2010/source-documents/test1"):
for file in files:
if file.endswith(".txt"):
x_ = codecs.open(os.path.join(root,file),"r", "utf-8-sig")
for lines in x_.readlines():
y_ = y_ + lines
'''Tokenizing the senteces of the text file.'''
from nltk.tokenize import sent_tokenize
raw_docs = sent_tokenize(y_)
tokenized_docs = [sent_tokenize(y_) for sent in raw_docs]
'''Removing punctuation marks.'''
regex = re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = ''
for review in tokenized_docs:
new_review = ''
for token in review:
new_token = regex.sub(u'', token)
if not new_token == u'':
new_review+= new_token
tokenized_docs_no_punctuation += (new_review)
print(tokenized_docs_no_punctuation)
'''Connecting and inserting tokenized documents without punctuation in database field.'''
def connect():
for i in range(len(tokenized_docs_no_punctuation)):
conn = mysql.connector.connect(user = 'root', password = '', unix_socket = "/tmp/mysql.sock", database = 'test' )
cursor = conn.cursor()
cursor.execute("""INSERT INTO splitted_sentences(sentence_id, splitted_sentences) VALUES(%s, %s)""",(cursor.lastrowid,(tokenized_docs_no_punctuation[i])))
conn.commit()
conn.close()
if __name__ == '__main__':
connect()
After writing the above code, The result is like
2 | S | N |
| 3 | S | o |
| 4 | S | |
| 5 | S | d |
| 6 | S | o |
| 7 | S | u |
| 8 | S | b |
| 9 | S | t |
| 10 | S | |
| 11 | S | m |
| 12 | S | y |
| 13 | S |
| 14 | S | d
in the database.
It should be like:
1 | S | No doubt, my dear friend.
2 | S | no doubt.
I suggest making the following edits(use what you would like). But this is what I used to get your code running. Your issue is that review in for review in tokenized_docs: is already a string. So, this makes token in for token in review: characters. Therefore to fix this I tried -
tokenized_docs = ['"No doubt, my dear friend, no doubt; but in the meanwhile suppose we talk of this annuity.', 'Shall we say one thousand francs a year."', '"What!"', 'asked Bonelle, looking at him very fixedly.', '"My dear friend, I mistook; I meant two thousand francs per annum," hurriedly rejoined Ramin.', 'Monsieur Bonelle closed his eyes, and appeared to fall into a gentle slumber.', 'The mercer coughed;\nthe sick man never moved.', '"Monsieur Bonelle."']
'''Removing punctuation marks.'''
regex = re.compile('[%s]' % re.escape(string.punctuation))
tokenized_docs_no_punctuation = []
for review in tokenized_docs:
new_token = regex.sub(u'', review)
if not new_token == u'':
tokenized_docs_no_punctuation.append(new_token)
print(tokenized_docs_no_punctuation)
and got this -
['No doubt my dear friend no doubt but in the meanwhile suppose we talk of this annuity', 'Shall we say one thousand francs a year', 'What', 'asked Bonelle looking at him very fixedly', 'My dear friend I mistook I meant two thousand francs per annum hurriedly rejoined Ramin', 'Monsieur Bonelle closed his eyes and appeared to fall into a gentle slumber', 'The mercer coughed\nthe sick man never moved', 'Monsieur Bonelle']
The final format of the output is up to you. I prefer using lists. But you could concatenate this into a string as well.
nw = []
for review in tokenized_docs[0]:
new_review = ''
for token in review:
new_token = regex.sub(u'', token)
if not new_token == u'':
new_review += new_token
nw.append(new_review)
'''Inserting into database'''
def connect():
for j in nw:
conn = mysql.connector.connect(user = 'root', password = '', unix_socket = "/tmp/mysql.sock", database = 'Thesis' )
cursor = conn.cursor()
cursor.execute("""INSERT INTO splitted_sentences(sentence_id, splitted_sentences) VALUES(%s, %s)""",(cursor.lastrowid,j))
conn.commit()
conn.close()
if __name__ == '__main__':
connect()
def sss(request):
handle=open('b.txt','r+')
handle.write("I AM NEW FILE")
var=handle.read();
return HttpResponse(var)
urlpatterns = patterns('',
('^$',sss),
)
1.my b.txt has nothing
2.when i run my code ,it print this :
I AM NEW FILE7 鸸?; ??x 鸸鸸v1鸸pZ€0 鸸鸸燛?鸸8N鸸鸸p 坮 愵) 犭 ?`16鸸鸸 S6鸸鸸榑 鸸? 鸸# 鸸鸸p叠 {鸸€1鸸鸸 V 鸸鸸 #+ 爏 鸸 职 鑮 鸸鸸鸸`埤 >?) ?鸸鸸#? Z!x`%鸸p?鸸? 鸸鸸鄧鸸鸸#?`7鸸鸸鸸`? 柜 鸸鸸鑎1X 鸸鸸鸸鸸鸸?#鸸餷?鸸€0鸸(Q?鸸H?鸸P?#鸸 ' 鸸(5 ?, 7鸸啵6H宏 0??+噌? k%8除 `烋 鸸爐"繳` 鸸埻 鸸0?郤 鸸鸸鸸?爛/啊 鸸鸸鸸睾8S1`?`?鸸鸸悀0鸸 ?`??鸸繧爅 鸸餡 鸸些 鸸鸸鸸鸸鸸#]鄡HE,鸸鸸?瘅+?+鸸鸸鸸p戙 #O鸸?? 鸸鸸 37€P6蠯7鸸#= 鸸嘣 囗 ?+xP?x?如?70暡 鸸鸸鸸鸸鸸鸸 €鸸鸸鸸€ h *??x 纙1鸸鸸鸸€K 叠 鸸鹞8? ?鸸 鸸萰 鸸`?辣 #?饆 鸸鸸鸸鸸? 鸸€?鸸鸸鸸鸸鸸鄧鸸8(鸸P⒊ ?鸸? p(0B?鸸鸸嗨鸸鸸鸸鸸李 鸸鸸鸸邪 P?鸸鸸鸫 爛/爦+鸸蜣 9 鸸 楈 ?鸸鸸怱1鸸鸸恏鸸鸸鸸鸸袖 ; 鸸€?鸸€札 `?(?鸸ㄈ 鸸鸸+ 鸸栉0鸸愵 鸸鸸恾谿6 ?1谹,鸸鸸鸸 {0鸸鸸? X?鸸€D 鸸&?€?` 鸸H{ ?鸸葉Xw鸸鸸鸸皢 鸸狑 鸸鄩0缊0堩)€Q 鸸? ?鸸 ④ #?鸸鸸鸸鸸鸸 ?XA6鸸鸸? O 鸸0 h 鸸 鸸鸸李 鸸 ? j鸸鸸鸸鸸0昌 57極7#?H+ 鸸鸩 尛 `?鸸 18戙 鸸P ?噍6嗤0鸸鸸鸸楧6鸸坆 鸸a 鸸` 鸸鸸鸸鸸鸸鸸鸸惍砾 pG8s鸸鸸鸸# ? (, 蠵 ( 鄭? 鸸╒&鸸缞鸸鐽圡7鸸繮!0[ 0m 鸸鸸鸸鸸#?発0鸸鸸鸸鸸鸸? ?鸸饗 p?pZ爦+鸸#?€\1鸸犎 0如 ?艾 鸸棱? 鸸€;鸸? 鸸鸸`? 褶 ? 鸸鸸鸸给*`7鸸#嵀 6 R 恈鸸鸸鸸鸸p?鸸饇鸸埪00^#燽 鸸鸸8褶 h €,h ? 鸸鸸x+ 鸸鸸€37鸸鸸鸸鸸`+鸸P?鸸 1 杞 鸸鸸鸸鸸惥*鸸郔6鸸李 鸸鸸h: 鸸鸸83 ? 哀犎鸸鸸0s 鸸鸸鸸鸸? 蝎p篆 鸸鸸鸸鸸鸸纞" s找( ??x Q s l??x ndies".
* If value is 1, cand{{ value|pluralize:"y,ies" }} displays "1 candy".
* If value is 2, cand{{ value|pluralize:"y,ies" }} displays "2 candies".
u ,i u i ( RE RG R5 R3 R4 ( R R< R t singular_suffixt
plural_suffix( ( s? D:\Python25\lib\site-packages\django\template\defaultfilters.pyt pluralize4 s$
c C s d d k l } | | ? S( sD Takes a phone number and converts it in to its numerical equivalent.i( t
phone2numeric( Rc R ( R R
why?
thanks
The only way I can repro this is to open an existing non-empty file using 'r+' (Are you absolutely sure it's empty?). In any event, opening the file in the 'w+' mode truncates it.
What middle-ware are you using? I guess that you have a lot of middle-ware installed, which explains some of the garbage.
For debugging, use a logging module to log what var was. Otherwise you can't isolate the problem, right?
Also, should you convert the string to unicode before sending it off to HttpResponse?