generated code is not indent

generated code is not indent - python

I am modifying the oil file using python script. I have written EBNF grammar to convert oil file to AST using Grako. And generate oil file back from AST using codegen but the Oil file is not indent (generate in one line).
Sample Oil file:
CPU dummy
{
OS StdOS
{
SCALABILITYCLASS = SC1;
STATUS = EXTENDED;
};
};
Generated Oil:
CPUdummy{OSStdOS{SCALABILITYCLASS=SC1;STATUS=EXTENDED;};};
EBNF grammer:
file = [{Comments_option}] OIL_version Includes [implementation_definition] application_definition {object_definition_list};
Includes
= "#include" include_name ;
include_name
= ?/[!-_A-Za-z0-9]+/? ;
OIL_version
= "OIL_VERSION" "=" version description ";" ;
version = '"' ver '"';
implementation_definition
= "IMPLEMENTATION" name "{" implementation_spec_list "}" description ";";
implementation_spec_list
= [implementation_spec] ;
implementation_spec
= object "{" implementation_def "}" description ";";
object = "OS"
"TASK"
"COUNTER"
"ALARM"
"RESOURCE"
"EVENT"
"ISR"
"MESSAGE"
"COM"
"NM"
"APPMODE"
"IPDU"
"APPLICATION";
implementation_list
= [implementation_def]
| [implementation_list implementation_def] ;
implementation_def
= impl_attr_def
| impl_ref_def;
impl_attr_def
= "UINT32" auto_specifier number_range attribute_name multiple_specifier default_number description ";"
| ( "INT32" | "UINT64" | "INT64" ) auto_specifier number_range attribute_name multiple_specifier default_number description ";"
| "FLOAT" auto_specifier float_range attribute_name multiple_specifier default_float description ";"
| "ENUM" auto_specifier enumeration attribute_name multiple_specifier default_name description ";"
| "STRING" auto_specifier attribute_name multiple_specifier default_string description ";"
| "BOOLEAN" auto_specifier bool_values attribute_name multiple_specifier default_bool description ";" ;
impl_parameter_list
= [( "{" {implementation_def} [implementation_def] "}" )] ;
auto_specifier
= ["WITH_AUTO"];
number_range
= [( "[" ( number ".." | ( number ) ) number "]" )];
number_list
= number
| number_list "," number ;
default_number
= [( "=" ( number | "NO_DEFAULT" | "AUTO" ) )];
description
= [( ":" '"' comments '"' )] ;
float_range
= [( "[" float ".." float "]" )] ;
default_float
= [( "=" ( float | "NO_DEFAULT" | "AUTO" ) )] ;
enumeration
= "[" enumerator_list "]";
enumerator_list
= enumerator
| enumerator_list "," enumerator ;
enumerator
= name [impl_parameter_list] description;
bool_values
= [( "[" "TRUE" impl_parameter_list description "," "FALSE" impl_parameter_list description "]" )] ;
default_name
= [( "=" ( name | "NO_DEFAULT" | "AUTO" ) )] ;
default_string
= [( "=" ( string | "NO_DEFAULT" | "AUTO" ) )] ;
default_bool
= [( "=" ( boolean | "NO_DEFAULT" | "AUTO" ) )] ;
impl_ref_def
= object_ref_type reference_name multiple_specifier description ";";
object_ref_type
= "OS_TYPE"
| "TASK_TYPE"
| "COUNTER_TYPE"
| "ALARM_TYPE"
| "RESOURCE_TYPE"
| "EVENT_TYPE"
| "ISR_TYPE"
| "MESSAGE_TYPE"
| "COM_TYPE"
| "NM_TYPE"
| "APPMODE_TYPE"
| "IPDU_TYPE";
reference_name
= name
| object;
multiple_specifier
= [( "[" "]" )] ;
application_definition
= "CPU" name "{" [Includes] { ( parameter_list Comments_option ) } "}" description ";" ;
object_definition_list
= [object_definition];
Comment_list
= object_definition | parameter comments ;
object_definition
= object_name "{" { parameter_list Comments_option } "}" description ";" ;
object_name
= object name;
parameter_list
= [parameter];
parameter
= attribute_name "=" attribute_value [ "{" { ( parameter [Comments_option] ) } "}" ] description ";" ;
attribute_name
= name
| object;
attribute_value
= boolean
| float
| number
| string
| "AUTO"
| '"' string '"';
Comments_option
= ( Single_line Multi_line );
Single_line = {"//" comments};
Multi_line = {"/*#*" Comment_list "*#*/"};
name = ?/[-_A-Za-z0-9]+/?;
string = ?/[-_A-Za-z0-9_*, ]+/?;
ver = ?/[0-9.0-9]+/?;
comments = ?/[-_A-Za-z0-9 *#]+/? ;
boolean = "FALSE"
| "TRUE";
number = dec_number
| hex_number;
dec_number
= sign int_digits;
sign = [( "+" | "-" )] ;
int_digits
= zero_digit
| pos_digit
| pos_digit dec_digits ;
dec_digits
= {dec_digit} [dec_digit] ;
float = ver;
exponent = [( ( "e" | "E" ) sign dec_digits )] ;
zero_digit
= "0";
pos_digit
= "1"
| "2"
| "3"
| "4"
| "5"
| "6"
| "7"
| "8"
| "9";
dec_digit
= zero_digit
| pos_digit;
hex_number
= "0x" {hex_digit};
hex_digit
= "A"
| "B"
| "C"
| "D"
| "E"
| "F"
| "a"
| "b"
| "c"
| "d"
| "e"
| "f"
| "0"
| "1"
| "2"
| "3"
| "4"
| "5"
| "6"
| "7"
| "8"
| "9";
For indentation grako to be taken care or codegen. How to indent the generated code. Thanks.

import json
from grako.util import asjson
print(json.dumps(asjson(myast), indent=4))

Related

How to change date format using datetime from openpyxl? [duplicate]

This question already has answers here:
Parse date string and change format
(10 answers)
How to convert a date string to different format [duplicate]
(2 answers)
Closed 28 days ago.
This post was edited and submitted for review 27 days ago.
I would like to create a function that will convert the date from a string variable to a date format for further processing. The problem is that I managed to isolate the fields where there is no entry "date" : [], but I can't create an if condition that would distinguish the date format string.
my code:
input: json file
{
"entries": [
{
"attributes": {
"cn": "Diana Troy",
"sn": "Troy",
"givenName": "Diana",
"sAMAccountName": "wonder-woman",
"userAccountControl": 514,
"whenCreated": "2015-09-22 10:21:02+00:00",
"whenChanged": "2023-01-11 09:33:59+00:00",
"lastLogonTimestamp": [],
"pwdLastSet": "2023-01-11 09:33:59.608543+00:00",
"accountExpires": "9999-12-31 23:59:59.999999+00:00"
},
"dn": "CN=Diana Troy,OU=Users,OU=DC-COMICS,DC=universum,DC=local"
}
]
}
code:
with open(encoded_retrieved_users, 'r', encoding="UTF-8") as file:
data = json.load(file)
retrieved_users = data['entries']
retrieved_users.sort(key=lambda d: d["attributes"]["sn"]) # sortuje po sAMAccountName
def is_value(var):
if type(var) is int:
val = var
elif type(var) is str:
val = var
else:
val = None
return val
def to_short_date_format(string_to_format):
if is_value(string_to_format) == None:
short_date = "NO DATE"
else:
short_date = string_to_format
return short_date
for user in retrieved_users:
attributes = user['attributes']
userAccountControl = is_value(attributes['userAccountControl'])
whenCreated = to_short_date_format(attributes['whenCreated'])
whenChanged = to_short_date_format(attributes['whenChanged'])
lastLogonTimestamp = to_short_date_format(attributes['lastLogonTimestamp'])
pwdLastSet = to_short_date_format(attributes['pwdLastSet'])
accountExpires = to_short_date_format(attributes['accountExpires'])
print("userAccountControl | " + str(userAccountControl) + " | " + str(type(userAccountControl)))
print("whenCreated | " + whenCreated + " | " + str(type(whenCreated)))
print("whenChanged | " + whenChanged + " | " + str(type(whenChanged)))
print("lastLogonTimestamp | " + str(lastLogonTimestamp) + " | " + str(type(lastLogonTimestamp)))
print("pwdLastSet | " + str(pwdLastSet) + " | " + str(type(pwdLastSet)))
print("accountExpires | " + accountExpires + " | " + str(type(accountExpires)))
print("----------------------------------")
output:
wonder-woman
userAccountControl | 514 | <class 'int'>
whenCreated | 2015-09-22 10:21:02+00:00 | <class 'str'>
whenChanged | 2023-01-11 09:33:59+00:00 | <class 'str'>
lastLogonTimestamp | NO DATE | <class 'str'>
pwdLastSet | 2023-01-11 09:33:59.608543+00:00 | <class 'str'>
accountExpires | 9999-12-31 23:59:59.999999+00:00 | <class 'str'>
----------------------------------
what i would like to get:
def to_short_date_format(string_to_format):
if is_value(string_to_format) == None:
short_date = "NO DATE"
else:
if string_to_format == <string format %Y-%m-%d %H:%M:%S%z>
dt = datetime.strptime(string_to_format, '%Y-%m-%d %H:%M:%S%z')
short_date = dt.strftime('%Y-%m-%d')
elif string_to_format == <string format %Y-%m-%d %H:%M:%S.%f%z>
dt = datetime.strptime(string_to_format, '%Y-%m-%d %H:%M:%S.%f%z')
short_date = dt.strftime('%Y-%m-%d')
return short_date
output:
wonder-woman
userAccountControl | 514
whenCreated | 2015-09-22
whenChanged | 2023-01-11
lastLogonTimestamp | NO DATE
pwdLastSet | 2023-01-11
accountExpires | 9999-12-31
----------------------------------

I was able to get the output by reusing your snippet. Minor catch is that the date format that you were trying to extract was wrong :
my_date = "2013-06-10 11:50:16+00:00"
# print(my_date)
# print(type(my_date))
from datetime import datetime
dt =datetime.strptime(my_date, '%Y-%m-%d %H:%M:%S%z')
print(dt)
dt_new = dt.strftime('%Y-%m-%d')
print(dt_new)
Output:

Using data class with strings

I'm simplifying a project, so it's my first time splitting an entire script in modules/packages so I have the following class:
class Decorators:
def __init__(self):
pass
decorator = r'''
/>
( //-------------------------------------(
(*)OXOXOXOXO(*>======================================\
( \\---------------------------------------)
\>
'''
special_decorator = r'''
.
/ )) |\ ) ).
c--. (\ ( `. / ) (\ ( `. ). ( (
| | )) ) ) ( ( `.`. ) ) ( ( ) )
| | ( ( / _..----.._ ) | ( ( _..----.._ ( (
,-. | |---) V.'-------.. `-. )-/.-' ..------ `--) \._
| /===========| | ( | ) ( ``-.`\/'.-'' ( ) ``-._
| | / / / / / | |---------------------> <-------------------------_>=-
| \===========| | ..-'./\.`-.. _,,-'
`-' | |-------._------''_.-'----`-._``------_.-----'
| | ``----'' ``----''
| |
c--`
'''
I want to know how can I call the strings from this class to print it out on the main function for example:
import decorators
def main():
print(decorators.special_decorator)
I want it to print the string that I'm calling from the module

Cannot maintain focus on element in Selenium Python driver

I am trying to control the web by python to run a script and download the corresponding csv file.
Here is how the web page looks like with a dashboard menu to click the "Search" other button. Once clicked on Search button it shows a Search text box where one can enter a code and press enter to run.
Now I need to find the element of this Search box. From Inspect in Chrome, looks like below:
So I used the following code. I also used Actions to keep the focus on search box before I copy the code from a text file and send it to that search box.
def run_code():
""" Function to copy the code in Search and run it
"""
search_button=driver.find_element_by_link_text("Search")
search_button.click()
time.sleep(2)
with open('data_download_code.txt', 'r') as f:
code_file= f.read()
content_box=driver.find_element_by_class_name("ace_content")
# Getting the focus on the element
actions=ActionChains(driver)
actions.move_to_element(content_box)
actions.click()
content_box.send_keys(code_file,Keys.ENTER)
#content_box.submit()
However it throws an error of focus not on element.
I am not sure if I got the right element selector for Search from the attached html file, or it is just some focus issue. I did use Actions class there to get the focus.
I want the code to read the text in the txt file and send it to the search box and press enter to run it.
WebDriverException: Message: unknown error: cannot focus element
(Session info: chrome=71.0.3578.98)
EDIT: Extra html details for selector
Edit 2:
Edit 3:
So I am able to get the element for Search and it is able to copy the code from a txt file and enter in search box but I see it is not able to copy the whole code correctly hence gives an error. Pls see attached full code and how much got copied.
sourcetype=perf_log_bizx
(host=pc*bcf* OR host=pc*bsfapi* OR servername=pc*bcf* OR servername=pc*bsfapi*) OR
(host=sc*bcf* OR host=sc*bsfapi* OR servername=sc*bcf* OR servername=sc*bsfapi*) OR
(host=ac*bcf* OR host=ac*bsfapi* OR servername=ac*bcf* OR servername=ac*bsfapi*) OR
NOT "/perfLogServlet" NOT "REQ-\[*" earliest="12/18/2018:08:00:00" latest="12/18/2018:12:00:00"
| rex field=_raw "\[(?<client_ip>[\d\.]+)\]\s+\[(?<company_id>[^,]+),(?<company_name>[^,]+),(?<company_schema>[\w\.]+),(?<dbpool>[^,]+),(?<user_id>[^,]+),\S+\]\s+\S+\s+\S+\s+(?<render_time>\d+)\s(?<server_time>\d+)\s(?<end2end_time>\d+)\s+\S+\s\S+\s\[.*\]\s+\d+\-(?<call_id>\d+)\s+(?<module_id>(-|\w+))\s(?<page_name>(-|\w+))\s(?<page_qualifier>(-|\w+))"
| rex field=_raw "\[\[(?<MemoryAllocated>\d+)\w+\s+(?<CPUTimeTotal>\d+)\w+\s+(?<CPUTimeUser>\d+)\w+\s+(?<CPUTimeSystem>\d+)\w+\s+(?<FileRead>\d+)\w+\s+(?<FileWrite>\d+)\w+\s+(?<NetworkRead>\d+)\w+\s+(?<NetworkWrite>\d+)\w+\s+(?<NotClosedFiles>(\d+|-))\s+(?<NotClosedSockets>(\d+|-))\s+\]\]\s+(?<SQLInvokes>\d+)\s+(?<SQLTimeTotal>\d+)"
| eval company_id = ifnull(CMID, company_id)
| eval dbpool = ifnull(DPN, dbpool)
| eval company_schema =ifnull(SN, company_schema)
| eval user_id = ifnull(UID, user_id)
| eval module_id = ifnull(MID, module_id)
| eval page_name = ifnull(PID, page_name)
| eval page_qualifier = ifnull(PQ, page_qualifier)
| rex field=CAID "\d+\-(?<CAID>\d+)"
| eval call_id = ifnull(CAID, call_id)
| eval render_time = ifnull(RDT, render_time)
| eval server_time = ifnull(SVT, server_time)
| eval end2end_time = ifnull(EET, end2end_time)
| eval MemoryAllocated = ifnull(MEM, MemoryAllocated)
| eval CPUTimeTotal = ifnull(CPU, CPUTimeTotal)
| eval CPUTimeUser = ifnull(UCPU, CPUTimeUser)
| eval CPUTimeSystem = ifnull(SCPU, CPUTimeSystem)
| eval FileRead = ifnull(FRE, FileRead)
| eval FileWrite = ifnull(FWR, FileWrite)
| eval NetworkRead = ifnull(NRE, NetworkRead)
| eval NetworkWrite = ifnull(NWR, NetworkWrite)
| eval NotClosedFiles = ifnull(0, NotClosedFiles)
| eval NotClosedSockets = ifnull(0, NotClosedSockets)
| eval SQLInvokes = ifnull(SQLC, SQLInvokes)
| eval SQLTimeTotal = ifnull(SQLT, SQLTimeTotal)
| eval request_type = if(call_id=0,"Root", "Subaction")
| search call_id = 0 AND page_name!="NONE"
| eval full_page_name = module_id + "-" + page_name + "-" + page_qualifier + " [" + request_type + "]"
| eval has_open_sockets = if ( ifnull(NotClosedSockets,0) > 0, 1, 0)
| eval has_open_files = if ( ifnull(NotClosedFiles,0) > 0, 1, 0)
| eval time = strftime( _time, "%Y-%m-%d %H:%M:%S" )
| eval server = ifnull(servername, host)
| rex field=server "\w(?<dc>\d+)\w"
| eval dc_name = "DC" + tostring(dc)
| eval server_type = if (substr(server, 1, 2) = "pc", "PROD", if (substr(server, 1, 2) = "sc", "PREVIEW", if (substr(server, 1, 2) = "ac", "QA", "OTHER") ) )
| eval dc_company_user = dc + "|" + company_id + "|" + sha256( user_id )
| table
time,
dc_name,
server_type,
dbpool,
company_id,
full_page_name,
dc_company_user,
server_time,
end2end_time,
SQLInvokes,
SQLTimeTotal,
MemoryAllocated[![enter image description here][6]][6]
Edit4:
The code read from the txt file is also reading \n. So the string has \n in it and I guess that might be causing issues when sent to the WebDriver to run in the search box. Possible to read the code as it is in above edit?
'sourcetype=perf_log_bizx\n(host=pc*bcf* OR host=pc*bsfapi* OR servername=pc*bcf* OR servername=pc*bsfapi*) OR\n(host=sc*bcf* OR host=sc*bsfapi* OR servername=sc*bcf* OR servername=sc*bsfapi*) OR\n(host=ac*bcf* OR host=ac*bsfapi* OR servername=ac*bcf* OR servername=ac*bsfapi*) OR\nNOT "/perfLogServlet" NOT "REQ-\\[*" earliest="12/18/2018:08:00:00" latest="12/18/2018:12:00:00" \n \n | rex field=_raw "\\[(?<client_ip>[\\d\\.]+)\\]\\s+\\[(?<company_id>[^,]+),(?<company_name>[^,]+),(?<company_schema>[\\w\\.]+),(?<dbpool>[^,]+),(?<user_id>[^,]+),\\S+\\]\\s+\\S+\\s+\\S+\\s+(?<render_time>\\d+)\\s(?<server_time>\\d+)\\s(?<end2end_time>\\d+)\\s+\\S+\\s\\S+\\s\\[.*\\]\\s+\\d+\\-(?<call_id>\\d+)\\s+(?<module_id>(-|\\w+))\\s(?<page_name>(-|\\w+))\\s(?<page_qualifier>(-|\\w+))"\n | rex field=_raw "\\[\\[(?<MemoryAllocated>\\d+)\\w+\\s+(?<CPUTimeTotal>\\d+)\\w+\\s+(?<CPUTimeUser>\\d+)\\w+\\s+(?<CPUTimeSystem>\\d+)\\w+\\s+(?<FileRead>\\d+)\\w+\\s+(?<FileWrite>\\d+)\\w+\\s+(?<NetworkRead>\\d+)\\w+\\s+(?<NetworkWrite>\\d+)\\w+\\s+(?<NotClosedFiles>(\\d+|-))\\s+(?<NotClosedSockets>(\\d+|-))\\s+\\]\\]\\s+(?<SQLInvokes>\\d+)\\s+(?<SQLTimeTotal>\\d+)"\n \n | eval company_id = ifnull(CMID, company_id)\n | eval dbpool = ifnull(DPN, dbpool)\n | eval company_schema =ifnull(SN, company_schema)\n | eval user_id = ifnull(UID, user_id)\n \n | eval module_id = ifnull(MID, module_id)\n | eval page_name = ifnull(PID, page_name)\n | eval page_qualifier = ifnull(PQ, page_qualifier)\n \n | rex field=CAID "\\d+\\-(?<CAID>\\d+)"\n | eval call_id = ifnull(CAID, call_id)\n \n | eval render_time = ifnull(RDT, render_time)\n | eval server_time = ifnull(SVT, server_time)\n | eval end2end_time = ifnull(EET, end2end_time)\n | eval MemoryAllocated = ifnull(MEM, MemoryAllocated)\n | eval CPUTimeTotal = ifnull(CPU, CPUTimeTotal)\n | eval CPUTimeUser = ifnull(UCPU, CPUTimeUser)\n | eval CPUTimeSystem = ifnull(SCPU, CPUTimeSystem)\n | eval FileRead = ifnull(FRE, FileRead)\n | eval FileWrite = ifnull(FWR, FileWrite)\n | eval NetworkRead = ifnull(NRE, NetworkRead)\n | eval NetworkWrite = ifnull(NWR, NetworkWrite)\n | eval NotClosedFiles = ifnull(0, NotClosedFiles)\n | eval NotClosedSockets = ifnull(0, NotClosedSockets)\n | eval SQLInvokes = ifnull(SQLC, SQLInvokes)\n | eval SQLTimeTotal = ifnull(SQLT, SQLTimeTotal)\n \n | eval request_type = if(call_id=0,"Root", "Subaction")\n \n| search call_id = 0 AND page_name!="NONE"\n \n | eval full_page_name = module_id + "-" + page_name + "-" + page_qualifier + " [" + request_type + "]"\n | eval has_open_sockets = if ( ifnull(NotClosedSockets,0) > 0, 1, 0)\n | eval has_open_files = if ( ifnull(NotClosedFiles,0) > 0, 1, 0)\n | eval time = strftime( _time, "%Y-%m-%d %H:%M:%S" )\n | eval server = ifnull(servername, host)\n | rex field=server "\\w(?<dc>\\d+)\\w"\n | eval dc_name = "DC" + tostring(dc)\n | eval server_type = if (substr(server, 1, 2) = "pc", "PROD", if (substr(server, 1, 2) = "sc", "PREVIEW", if (substr(server, 1, 2) = "ac", "QA", "OTHER") ) )\n | eval dc_company_user = dc + "|" + company_id + "|" + sha256( user_id )\n \n| table\n time,\n dc_name,\n server_type,\n dbpool,\n company_id,\n full_page_name,\n dc_company_user,\n server_time,\n end2end_time,\n SQLInvokes,\n SQLTimeTotal,\n MemoryAllocated'

You should send keys to input field, but not to parent div. Try below instead:
content_box = driver.find_element_by_css_selector("div.ace_content input")
content_box.send_keys(code_file, Keys.ENTER)
or
content_box = driver.find_element_by_class_name('ace_text-input')
content_box.send_keys(code_file, Keys.ENTER)
Also note that most likely you won't need to use Actions

content_box=driver.find_element_by_class_name("ace_content")
this code will result in content_box being a "div" element. you can't send keys to a div element. inspect that div to find a "textarea" or "input" element, and set that to your content_box.

On top of #Andersson's answer (which you should accept btw, he did solve your problem ;) let me help you with stripping the \n from the source text. This code:
with open('data_download_code.txt', 'r') as f:
code_file= f.read()
, the read() method, returns the raw value of the file, with the EOL (end-of-line) characters intact. This though:
code_file = f.read.splitlines()
, will return it (in code_file) as a list of strings, each list member a line in the file. Now the question is - what to replace the EOL chars with? I'm not familiar with the language that's in it, so it's up to you to decide.
Say it is a semicolon, ;, this is how to transform the list back into a string:
code_file = ';'.join(code_file)
This will concatenate all list members in a single string, using that character as delimiter. Naturally, you just replace the char with whatever is applicable:
code_file = ' '.join(code_file) # a whitespace character
code_file = '\t'.join(code_file) # a tab
code_file = '\\n'.join(code_file) # a literal newline
code_file = 'whatever?'.join(code_file) # you name it
So the final form is:
with open('data_download_code.txt', 'r') as f:
code_file= f.readlines()
code_file = ';'.join(code_file)

pyparsing nested structure not working as expected

I'm trying to parse a simple JSON-like structure into python dics and then turn it into a proper JSON structure. The block is as follows:
###################################################
# HEADER TEXT
# HEADER TEXT
###################################################
NAME => {
NAME => VALUE,
NAME => VALUE,
NAME => VALUE,
NAME => {
NAME => {
NAME => VALUE, NAME => VALUE, NAME => VALUE,
},
} # comment
}, # more comments
and repeating. Rules:
NAME = alphanums and _
VALUE = decimal(6) | hex (0xA) | list of hex ([0x1,0x2]) | text in brackets([A]) | string("A")
I set up the following grammar:
cfgName = Word(alphanums+"_")
cfgString = dblQuotedString().setParseAction(removeQuotes)
cfgNumber = Word("0123456789ABCDEFx")
LBRACK, RBRACK, LBRACE, RBRACE = map(Suppress, "[]{}")
EQUAL = Literal('=>').suppress()
cfgObject = Forward()
cfgValue = Forward()
cfgElements = delimitedList(cfgValue)
cfgArray = Group(LBRACK + Optional(cfgElements, []) + RBRACK)
cfgValue << (cfgString | cfgNumber | cfgArray | cfgName | Group(cfgObject))
memberDef = Group(cfgName + EQUAL + cfgValue)
cfgMembers = delimitedList(memberDef)
cfgObject << Dict(LBRACE + Optional(cfgMembers) + RBRACE)
cfgComment = pythonStyleComment
cfgObject.ignore(cfgComment)
EDIT: I've managed to isolate the problem. Proper JSON is
{member,member,member}
however my structure is:
{member,member,member,}
the last element in every nested structure is comma separated and I don't know how to account for that in the grammar.

SimpleParse not showing the result tree

I am working on the Google ProtoBuff where I am trying to parse the proto file using SimpleParse in python.
I am using EBNF format with SimpleParse, it shows success but there is nothing in the result Tree, not sure what is going wrong. Any help would really be appreciated.
Following is the grammar file:
proto ::= ( message / extend / enum / import / package / option / ';' )*
import ::= 'import' , strLit , ';'
package ::= 'package' , ident , ( '.' , ident )* , ';'
option ::= 'option' , optionBody , ';'
optionBody ::= ident , ( '.' , ident )* , '=' , constant
message ::= 'message' , ident , messageBody
extend ::= 'extend' , userType , '{' , ( field / group / ';' )* , '}'
enum ::= 'enum' , ident , '{' , ( option / enumField / ';' )* , '}'
enumField ::= ident , '=' , intLit , ';'
service ::= 'service' , ident , '{' , ( option / rpc / ';' )* , '}'
rpc ::= 'rpc' , ident , '(' , userType , ')' , 'returns' , '(' , userType , ')' , ';'
messageBody ::= '{' , ( field / enum / message / extend / extensions / group / option / ':' )* , '}'
group ::= label , 'group' , camelIdent , '=' , intLit , messageBody
field ::= label , type , ident , '=' , intLit , ( '[' , fieldOption , ( ',' , fieldOption )* , ']' )? , ';'
fieldOption ::= optionBody / 'default' , '=' , constant
extensions ::= 'extensions' , extension , ( ',' , extension )* , ';'
extension ::= intLit , ( 'to' , ( intLit / 'max' ) )?
label ::= 'required' / 'optional' / 'repeated'
type ::= 'double' / 'float' / 'int32' / 'int64' / 'uint32' / 'uint64' / 'sint32' / 'sint64' / 'fixed32' / 'fixed64' / 'sfixed32' / 'sfixed64' / 'bool' / 'string' / 'bytes' / userType
userType ::= '.'? , ident , ( '.' , ident )*
constant ::= ident / intLit / floatLit / strLit / boolLit
ident ::= [A-Za-z_],[A-Za-z0-9_]*
camelIdent ::= [A-Z],[\w_]*
intLit ::= decInt / hexInt / octInt
decInt ::= [1-9],[\d]*
hexInt ::= [0],[xX],[A-Fa-f0-9]+
octInt ::= [0],[0-7]+
floatLit ::= [\d]+ , [\.\d+]?
boolLit ::= 'true' / 'false'
strLit ::= quote ,( hexEscape / octEscape / charEscape / [^\0\n] )* , quote
quote ::= ['']
hexEscape ::= [\\],[Xx],[A-Fa-f0-9]
octEscape ::= [\\0]? ,[0-7]
charEscape ::= [\\],[abfnrtv\\\?'']
And this is the python code that I am experimenting with:
from simpleparse.parser import Parser
from pprint import pprint
protoGrammar = ""
protoInput = ""
protoGrammarRoot = "proto"
with open ("proto_grammar.ebnf", "r") as grammarFile:
protoGrammar=grammarFile.read()
with open("sample.proto", "r") as protoFile:
protoInput = protoFile.read().replace('\n', '')
parser = Parser(protoGrammar,protoGrammarRoot)
success, resultTree, newCharacter = parser.parse(protoInput)
pprint(protoInput)
pprint(success)
pprint(resultTree)
pprint(newCharacter)
and this the proto file that I am trying to parse
message AmbiguousMsg {
optional string mypack_ambiguous_msg = 1;
optional string mypack_ambiguous_msg1 = 1;
}
I get the output as
1
[]
0

I am new to Python but I came up with this, although I am not entirely sure of your output format. Hopefully this will point you in the right direction. Feel free to modify the code below to cater your requirements.
#!/usr/bin/python
# (c) 2015 enthusiasticgeek for StackOverflow. Use the code in anyway you want but leave credits intact. Also use this code at your own risk. I do not take any responsibility for your usage - blame games and trolls will strictly *NOT* be tolerated.
import re
#data_types=['string','bool','enum','int32','uint32','int64','uint64','sint32','sint64','bytes','string','fixed32','sfixed32','float','fixed64','sfixed64','double']
#function # 1
#Generate list of units in the brackets
#================ tokens based on braces ====================
def find_balanced_braces(args):
parts = []
for arg in args:
if '{' not in arg:
continue
chars = []
n = 0
for c in arg:
if c == '{':
if n > 0:
chars.append(c)
n += 1
elif c == '}':
n -= 1
if n > 0:
chars.append(c)
elif n == 0:
parts.append(''.join(chars).lstrip().rstrip())
chars = []
elif n > 0:
chars.append(c)
return parts
#function # 2
#================ Retrieve Nested Levels ====================
def find_nested_levels(test, count_level):
count_level=count_level+1
level = find_balanced_braces(test)
if not bool(level):
return count_level-1
else:
return find_nested_levels(level,count_level)
#function # 3
#================ Process Nested Levels ====================
def process_nested_levels(test, count_level):
count_level=count_level+1
level = find_balanced_braces(test)
print "===== Level = " + str(count_level) + " ====="
for i in range(len(level)):
#print level[i] + "\n"
exclusive_level_messages = ''.join(level[i]).split("message")[0]
exclusive_level_messages_tokenized = ''.join(exclusive_level_messages).split(";")
#print exclusive_level_messages + "\n"
for j in range(len(exclusive_level_messages_tokenized)):
pattern = exclusive_level_messages_tokenized[j].lstrip()
print pattern
#match = "\message \s*(.*?)\s*\{"+pattern
#match_result = re.findall(match, level[i])
#print match_result
print "===== End Level ====="
if not bool(level):
return count_level-1
else:
return process_nested_levels(level,count_level)
#============================================================
#=================================================================================
test_string=("message a{ optional string level-i1.l1.1 = 1 [default = \"/\"]; "
"message b{ required bool level-i1.l2.1 = 1; required fixed32 level-i1.l2.1 = 2; "
"message c{ required string level-i1.l3.1 = 1; } "
"} "
"} "
"message d{ required uint64 level-i2.l1.1 = 1; required double level-i2.l1.2 = 2; "
"message e{ optional double level-i2.l2.1 = 1; "
"message f{ optional fixed64 level-i2.l3.1 = 1; required fixed32 level-i2.l3.2 = 2; "
"message g{ required bool level-i2.l4.1 = 2; } "
"} "
"} "
"} "
"message h{ required uint64 level-i3.l1.1 = 1; required double level-i3.l1.2 = 2; }")
#Right now I do not see point in replacing \n with blank space
with open ("fileproto.proto", "r") as myfile:
data=myfile.read().replace('\n', '\n')
print data
count_level=0
#replace 'data' in the following line with 'test_string' for tests
nested_levels=process_nested_levels([data],count_level)
print "Total count levels depth = " + str(nested_levels)
print "========================\n"
My output looks as follows
// This defines protocol for a simple server that lists files.
//
// See also the nanopb-specific options in fileproto.options.
message ListFilesRequest {
optional string path = 1 [default = "/"];
}
message FileInfo {
required uint64 inode = 1;
required string name = 2;
}
message ListFilesResponse {
optional bool path_error = 1 [default = false];
repeated FileInfo file = 2;
}
===== Level = 1 =====
optional string path = 1 [default = "/"]
required uint64 inode = 1
required string name = 2
optional bool path_error = 1 [default = false]
repeated FileInfo file = 2
===== End Level =====
===== Level = 2 =====
===== End Level =====
Total count levels depth = 1
========================
NOTE After print pattern you can tokenize further if necessary by taking pattern as in input. I have commented one example with regex.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

generated code is not indent - python

import json from grako.util import asjson print(json.dumps(asjson(myast), indent=4))

Related

How to change date format using datetime from openpyxl? [duplicate]

Using data class with strings

Cannot maintain focus on element in Selenium Python driver

pyparsing nested structure not working as expected

SimpleParse not showing the result tree

Categories

Resources