parse a function with pyparsing

parse a function with pyparsing - python

I am trying to parse some LUA functions with pyparsing. It also works for almost everything I need except one case where the last parameter is just a word.
So this is my code. This is my first parser using pyparsing but I did my best to structure it logically:
To explain my comments within the code:
trigger_async(<object>, <name>, <param>)
trigger_async(<parameters>)
<param> = <name> = <type>
def parse_events_from_text(text):
variable = Word(alphanums + "-_.:()")
# Entity on which the event will be triggered
obj = variable.setResultsName("object")
# Name of the event
name = (quotedString + Optional(Suppress("..") + variable)).setResultsName("name")
# Parameter List of the event
paramName = variable.setResultsName("name")
paramType = variable.setResultsName("type")
param = Group(paramName + Suppress("=") + paramType).setResultsName("parameter")
paramList = Group(Optional(Suppress("{") + ZeroOrMore(delimitedList(param)) + Suppress("}"))).setResultsName("parameters")
function_parameter = obj | name | paramList
# Function Start
trigger = "trigger"
async = Optional("_async")
# Function Call
functionOpening = Combine(trigger + async + "(").setResultsName("functionOpening")
functionCall = ZeroOrMore(Group(functionOpening + delimitedList(function_parameter) + Suppress(")")))
resultsList = functionCall.searchString(text)
results = []
for resultsL in resultsList:
if len(resultsL) != 0:
if resultsL not in results:
results.append(resultsL)
return results
So the parser was written for those kinds of events:
trigger(self._entity, 'game:construction:changed', { entity = target })`
trigger_async(entity, 'game:heal:healer_damaged', { healer = entity })`
trigger_async(entity, 'game:heal:healer_damaged', { healer = entity, entity = target, test = party})`
trigger_async(entity, 'game:heal:healer')`
trigger(entity.function(), 'game:heal:healer', {})`
But the problem is if there aren't any curly braces:
trigger(entity, 'game:heal:healer', entity.test)
it won't work because of my declared variable
variable = Word(alphanums + "-_.:()")
where braces are allowed so the parser is confused with the last one which is "missing" for the function end. If I would write
trigger(entity,'game:heal:healer',entity.test))
it would work.
I sat down and wanted to rewrite the parser but I dont know how? Somehow I must tell that it is only valid if the variable has 1 open bracket and 1 closing bracket like so:
trigger(entity,'game:heal:healer',entity.test(input))
else don't eat up that closing brace.
trigger(entity,'game:heal:healer',entity.test) <-- Variable, don't eat it!

Related

Extract Text from a word document

I am trying to scrape data from a word document available at:-
https://dl.dropbox.com/s/pj82qrctzkw9137/HE%20Distributors.docx
I need to scrape the Name, Address, City, State, and Email ID. I am able to scrape the E-mail using the below code.
import docx
content = docx.Document('HE Distributors.docx')
location = []
for i in range(len(content.paragraphs)):
stat = content.paragraphs[i].text
if 'Email' in stat:
location.append(i)
for i in location:
print(content.paragraphs[i].text)
I tried to use the steps mentioned:
How to read data from .docx file in python pandas?
I need to convert this into a data frame with all the columns mentioned above.
Still facing issues with the same.

There are some inconsistencies in the document - phone numbers starting with Tel: sometimes, and Tel.: other times, and even Te: once, and I noticed one of the emails is just in the last line for that distributor without the Email: prefix, and the State isn't always in the last line.... Still, for the most part, most of the data can be extracted with regex and/or splits.
The distributors are separated by empty lines, and the names are in a different color - so I defined this function to get the font color of any paragraph from its xml:
# from bs4 import BeautifulSoup
def getParaColor(para):
try:
return BeautifulSoup(
para.paragraph_format.element.xml, 'xml'
).find('color').get('w:val')
except:
return ''
The try...except hasn't been necessary yet, but just in case...
(The xml is actually also helpful for double-checking that .text hasn't missed anything - in my case, I noticed that the email for Shri Adhya Educational Books wasn't getting extracted.)
Then, you can process the paragraphs from docx.Document with a function like:
# import re
def splitParas(paras):
ptc = [(
p.text, getParaColor(p), p.paragraph_format.element.xml
) for p in paras]
curSectn = 'UNKNOWN'
splitBlox = [{}]
for pt, pc, px in ptc:
# double-check for missing text
xmlText = BeautifulSoup(px, 'xml').text
xmlText = ' '.join([s for s in xmlText.split() if s != ''])
if len(xmlText) > len(pt): pt = xmlText
# initiate
if not pt:
if splitBlox[-1] != {}:
splitBlox.append({})
continue
if pc == '20752E':
curSectn = pt.strip()
continue
if splitBlox[-1] == {}:
splitBlox[-1]['section'] = curSectn
splitBlox[-1]['raw'] = []
splitBlox[-1]['Name'] = []
splitBlox[-1]['address_raw'] = []
# collect
splitBlox[-1]['raw'].append(pt)
if pc == 'D12229':
splitBlox[-1]['Name'].append(pt)
elif re.search("^Te.*:.*", pt):
splitBlox[-1]['tel_raw'] = re.sub("^Te.*:", '', pt).strip()
elif re.search("^Mob.*:.*", pt):
splitBlox[-1]['mobile_raw'] = re.sub("^Mob.*:", '', pt).strip()
elif pt.startswith('Email:') or re.search(".*[#].*[.].*", pt):
splitBlox[-1]['Email'] = pt.replace('Email:', '').strip()
else:
splitBlox[-1]['address_raw'].append(pt)
# some cleanup
if splitBlox[-1] == {}: splitBlox = splitBlox[:-1]
for i in range(len(splitBlox)):
addrsParas = splitBlox[i]['address_raw'] # for later
# join lists into strings
splitBlox[i]['Name'] = ' '.join(splitBlox[i]['Name'])
for k in ['raw', 'address_raw']:
splitBlox[i][k] = '\n'.join(splitBlox[i][k])
# search address for City, State and PostCode
apLast = addrsParas[-1].split(',')[-1]
maybeCity = [ap for ap in addrsParas if '–' in ap]
if '–' not in apLast:
splitBlox[i]['State'] = apLast.strip()
if maybeCity:
maybePIN = maybeCity[-1].split('–')[-1].split(',')[0]
maybeCity = maybeCity[-1].split('–')[0].split(',')[-1]
splitBlox[i]['City'] = maybeCity.strip()
splitBlox[i]['PostCode'] = maybePIN.strip()
# add mobile to tel
if 'mobile_raw' in splitBlox[i]:
if 'tel_raw' not in splitBlox[i]:
splitBlox[i]['tel_raw'] = splitBlox[i]['mobile_raw']
else:
splitBlox[i]['tel_raw'] += (', ' + splitBlox[i]['mobile_raw'])
del splitBlox[i]['mobile_raw']
# split tel [as needed]
if 'tel_raw' in splitBlox[i]:
tel_i = [t.strip() for t in splitBlox[i]['tel_raw'].split(',')]
telNum = []
for t in range(len(tel_i)):
if '/' in tel_i[t]:
tns = [t.strip() for t in tel_i[t].split('/')]
tel1 = tns[0]
telNum.append(tel1)
for tn in tns[1:]:
telNum.append(tel1[:-1*len(tn)]+tn)
else:
telNum.append(tel_i[t])
splitBlox[i]['Tel_1'] = telNum[0]
splitBlox[i]['Tel'] = telNum[0] if len(telNum) == 1 else telNum
return splitBlox
(Since I was getting font color anyway, I decided to add another
column called "section" to put East/West/etc in. And I added "PostCode" too, since it seems to be on the other side of "City"...)
Since "raw" is saved, any other value can be double checked manually at least.
The function combines "Mobile" into "Tel" even though they're extracted with separate regex.
I'd say "Tel_1" is fairly reliable, but some of the inconsistent patterns mean that other numbers in "Tel" might come out incorrect if they were separated with '/'.
Also, "Tel" is either a string or a list of strings depending on how many numbers there were in "tel_raw".
After this, you can just view as DataFrame with:
#import docx
#import pandas
content = docx.Document('HE Distributors.docx')
# pandas.DataFrame(splitParas(content.paragraphs)) # <--all Columns
pandas.DataFrame(splitParas(content.paragraphs))[[
'section', 'Name', 'address_raw', 'City',
'PostCode', 'State', 'Email', 'Tel_1', 'tel_raw'
]]

using another WS as validation Flask/Rest/Mysql

I am trying to build a simple web application with 3 web services. Two of my web services are supposed to validate if a student exist in a course or not. This is done by a simple SELECT-query. My third web service should add a student into a database, but only if the student do exist in the specific course.
This is my validation WS which should return a true/false.
#app.route('/checkStudOnCourse/<string:AppCode>/<string:ideal>', methods= ["GET"])
def checkStudOnCourseWS(AppCode, ideal):
myCursor3 = mydb.cursor()
query3 = ("SELECT studentID FROM Ideal.course WHERE applicationCode = " + "'" + AppCode + "' AND Ideal = " + "'" + ideal + "'")
myCursor3.execute(query3)
myresult3 = myCursor3.fetchall()
if len(myresult3) == 0:
return render_template('Invalid.html')
else:
return jsonify({'Student in course ': True})
Below is regResult which should do a SQL insert into a database. I only want the submit to work if the above result is "True", how can I do that? I know I have not done the INSERT query, but that is not a problem.
What I am unsure about is: How can I only let the submit be be INSERTED if the validation WS is "True".
#app.route('/register', methods=["POST", "GET"])
def regResultat():
if request.method == "POST":
Period = request.form['period']
#ProvNr = request.form['provNr']
Grade = request.form['grade']
Applicationcode = request.form['applicationcode']
#Datum = request.form['datum']
Ideal = request.form['ideal']
CheckStudOnCourse = 'http://127.0.0.1:5000/checkAppCodeWS/'+Applicationcode+'/'+Ideal
CheckStudOnResp = requests.get(CheckStudOnCourse)

At first, such syntax:
if len(myresult3) == 0, can be simplified by if myresult3, because Python evaluates that implicitly to bool.
Secondly, if you once returned from function, there is no need to write an else statement:
if len(myresult3) == 0:
return render_template('Invalid.html') # < -- in case 'True',
# it returns here, otherwise
# function keeps going"""
return jsonify({'Student in course ': True}) # < -- in case 'False', it is returned here
Focusing on your issue, you could do that:
Get your value from ws
CheckStudOnCourse = 'http://127.0.0.1:5000/checkAppCodeWS/'+Applicationcode+'/'+Ideal
CheckStudOnResp = requests.get(CheckStudOnCourse)
Extract json from it:
if result_as_json.status == 200:
result_as_json = CheckStudOnResp.json() # < -- it is now a dict
Do some checks:
if result_as_json.get('Student in course', False): # I highly suggest to use other
# convention to name json keys
# e.g. Student in course ->
# student_exists_in_course
# do your code here

Concatenating strings and referencing

I am actually making a script to postprocess some database.
I want my script to get the path and the name of files when I enter some details (version, option, etc), and it looks like this...
def file_info(version, something, option):
####### This part is the DB
## Version-path
PATH_ver1 = '/ver1'
PATH_something1 = '/something1'
## Name of files, there are bunch of these datas
DEF_ver2_something2 = '/name of file'
DEF_ver2_something1_option4 = '/name of file'
####### Now starts to postprocess
## path setting - other variables also follows this
if version == 'ver1':
PATH_VER = PATH_ver1
elif version == 'ver2':
PATH_VER = PATH_ver2
## Concatenating the paths
PATH_BIN = PATH_TOP + PATH_VER + PATH_TYP + PATH_OPT
## Setting the file name
BIN_file = 'DEF_' + version + '_' + something + '_' + option
return PATH_BIN, BIN_FILE
def main():
version = input("version here")
something = input("something here")
option = input("option here")
output = file_info(version, something, option)
When I enter something, I can get the path of files correctly, but the file name gives the name of variable, instead of '/name of file'.
Also, since my variables have two values, I mean, it is not a one-by-one matching, I think I can't use a dictionary format. Each items have one key (DEF_***), and there are two corresponding values (PATH_BIN and BIN_FILE). How can I solve this?

It sounds like what you need are nested dictionaries:
#!python3.6
####### This part is the DB
PATH_TOP = '/TOP/PATH'
## Version-path
PATH = {'ver1':'/ver1',
'ver2':'/ver2',
'something1':'/something1',
'something2':'/something2',
'' :'',
'opt1':'/opt1',
'opt2':'/opt2',
'opt3':'/opt3',
'opt4':'/Totally/Different/PATH'
}
## Name of files
DEF = {'ver1':{'something1':{'' :'v1s1'
,'opt1':'v1s1o1'
,'opt2':'v1s1o2'
}
,'something2':{'opt2':'v1s2o2'
,'opt3':'v1s2o3'
}
}
,'ver2':{'something1':{'opt1':'v2s1o1'
,'opt2':'v2s1o2'
,'opt4':'v2s1o4'
}
,'something2':{'' :'v2s2'
}
}
}
def file_info(version, something, option):
PATH_BIN = PATH_TOP + PATH[version] + PATH[something] + PATH[option]
BIN_FILE = DEF[version][something][option]
return PATH_BIN, BIN_FILE
def prompt(item,values):
lst = "'" + "','".join(values) + "'"
while True:
selection = input(f'{item}({lst})? ')
if selection in values:
break
print('not a choice')
return selection
def main():
version = prompt('Version',DEF)
something = prompt('Something',DEF[version])
option = prompt('Option',DEF[version][something])
output = file_info(version, something, option)
print(output)
if __name__ == '__main__':
main()
Output:
C:\>test.py
Version('ver1','ver2')? ver1
Something('something1','something2')? some
not a choice
Something('something1','something2')? something2
Option('opt2','opt3')? opt2
('/TOP/PATH/ver1/something2/opt2', 'v1s2o2')
C:\>test.py
Version('ver1','ver2')? ver1
Something('something1','something2')? something1
Option('','opt1','opt2')?
('/TOP/PATH/ver1/something1', 'v1s1')

Reduce RAM usage in Python script

I've written a quick little program to scrape book data off of a UNESCO website which contains information about book translations. The code is doing what I want it to, but by the time it's processed about 20 countries, it's using ~6GB of RAM. Since there are around 200 I need to process, this isn't going to work for me.
I'm not sure where all the RAM usage is coming from, so I'm not sure how to reduce it. I'm assuming that it's the dictionary that's holding all the book information, but I'm not positive. I'm not sure if I should simply make the program run once for each country, rather than processing the lot of them? Or if there's a better way to do it?
This is the first time I've written anything like this, and I'm a pretty novice, self-taught programmer, so please point out any significant flaws in the code, or improvement tips you have that may not directly relate to the question at hand.
This is my code, thanks in advance for any assistance.
from __future__ import print_function
import urllib2, os
from bs4 import BeautifulSoup, SoupStrainer
''' Set list of countries and their code for niceness in explaining what
is actually going on as the program runs. '''
countries = {"AFG":"Afghanistan","ALA":"Aland Islands","DZA":"Algeria"}
'''List of country codes since dictionaries aren't sorted in any
way, this makes processing easier to deal with if it fails at
some point, mid run.'''
country_code_list = ["AFG","ALA","DZA"]
base_url = "http://www.unesco.org/xtrans/bsresult.aspx?lg=0&c="
destination_directory = "/Users/robbie/Test/"
only_restable = SoupStrainer(class_="restable")
class Book(object):
def set_author(self,book):
'''Parse the webpage to find author names. Finds last name, then
first name of original author(s) and sets the Book object's
Author attribute to the resulting string.'''
authors = ""
author_last_names = book.find_all('span',class_="sn_auth_name")
author_first_names = book.find_all('span', attrs={\
'class':"sn_auth_first_name"})
if author_last_names == []: self.Author = [" "]
for author in author_last_names:
try:
first_name = author_first_names.pop()
authors = authors + author.getText() + ', ' + \
first_name.getText()
except IndexError:
authors = authors + (author.getText())
self.author = authors
def set_quality(self,book):
''' Check to see if book page is using Quality, then set it if
so.'''
quality = book.find_all('span', class_="sn_auth_quality")
if len(quality) == 0: self.quality = " "
else: self.quality = quality[0].contents[0]
def set_target_title(self,book):
target_title = book.find_all('span', class_="sn_target_title")
if len(target_title) == 0: self.target_title = " "
else: self.target_title = target_title[0].contents[0]
def set_target_language(self,book):
target_language = book.find_all('span', class_="sn_target_lang")
if len(target_language) == 0: self.target_language = " "
else: self.target_language = target_language[0].contents[0]
def set_translator_name(self,book) :
translators = ""
translator_last_names = book.find_all('span', class_="sn_transl_name")
translator_first_names = book.find_all('span', \
class_="sn_transl_first_name")
if translator_first_names == [] and translator_last_names == [] :
self.translators = " "
return None
for translator in translator_last_names:
try:
first_name = translator_first_names.pop()
translators = translators + \
(translator.getText() + ',' \
+ first_name.getText())
except IndexError:
translators = translators + \
(translator.getText())
self.translators = translators
def set_published_city(self,book) :
published_city = book.find_all('span', class_="place")
if len(published_city) == 0:
self.published_city = " "
else: self.published_city = published_city[0].contents[0]
def set_publisher(self,book) :
publisher = book.find_all('span', class_="place")
if len(publisher) == 0:
self.publisher = " "
else: self.publisher = publisher[0].contents[0]
def set_published_country(self,book) :
published_country = book.find_all('span', \
class_="sn_country")
if len(published_country) == 0:
self.published_country = " "
else: self.published_country = published_country[0].contents[0]
def set_year(self,book) :
year = book.find_all('span', class_="sn_year")
if len(year) == 0:
self.year = " "
else: self.year = year[0].contents[0]
def set_pages(self,book) :
pages = book.find_all('span', class_="sn_pagination")
if len(pages) == 0:
self.pages = " "
else: self.pages = pages[0].contents[0]
def set_edition(self, book) :
edition = book.find_all('span', class_="sn_editionstat")
if len(edition) == 0:
self.edition = " "
else: self.edition = edition[0].contents[0]
def set_original_title(self,book) :
original_title = book.find_all('span', class_="sn_orig_title")
if len(original_title) == 0:
self.original_title = " "
else: self.original_title = original_title[0].contents[0]
def set_original_language(self,book) :
languages = ''
original_languages = book.find_all('span', \
class_="sn_orig_lang")
for language in original_languages:
languages = languages + language.getText() + ', '
self.original_languages = languages
def export(self, country):
''' Function to allow us to easilly pull the text from the
contents of the Book object's attributes and write them to the
country in which the book was published's CSV file.'''
file_name = os.path.join(destination_directory + country + ".csv")
with open(file_name, "a") as by_country_csv:
print(self.author.encode('UTF-8') + " & " + \
self.quality.encode('UTF-8') + " & " + \
self.target_title.encode('UTF-8') + " & " + \
self.target_language.encode('UTF-8') + " & " + \
self.translators.encode('UTF-8') + " & " + \
self.published_city.encode('UTF-8') + " & " + \
self.publisher.encode('UTF-8') + " & " + \
self.published_country.encode('UTF-8') + " & " + \
self.year.encode('UTF-8') + " & " + \
self.pages.encode('UTF-8') + " & " + \
self.edition.encode('UTF-8') + " & " + \
self.original_title.encode('UTF-8') + " & " + \
self.original_languages.encode('UTF-8'), file=by_country_csv)
by_country_csv.close()
def __init__(self, book, country):
''' Initialize the Book object by feeding it the HTML for its
row'''
self.set_author(book)
self.set_quality(book)
self.set_target_title(book)
self.set_target_language(book)
self.set_translator_name(book)
self.set_published_city(book)
self.set_publisher(book)
self.set_published_country(book)
self.set_year(book)
self.set_pages(book)
self.set_edition(book)
self.set_original_title(book)
self.set_original_language(book)
def get_all_pages(country,base_url):
''' Create a list of URLs to be crawled by adding the ISO_3166-1_alpha-3
country code to the URL and then iterating through the results every 10
pages. Returns a string.'''
base_page = urllib2.urlopen(base_url+country)
page = BeautifulSoup(base_page, parse_only=only_restable)
result_number = page.find_all('td',class_="res1",limit=1)
if not result_number:
return 0
str_result_number = str(result_number[0].getText())
results_total = int(str_result_number.split('/')[1])
page.decompose()
return results_total
def build_list(country_code_list, countries):
''' Build the list of all the books, and return a list of Book objects
in case you want to do something with them in something else, ever.'''
for country in country_code_list:
print("Processing %s now..." % countries[country])
results_total = get_all_pages(country, base_url)
for url in range(results_total):
if url % 10 == 0 :
all_books = []
target_page = urllib2.urlopen(base_url + country \
+"&fr="+str(url))
page = BeautifulSoup(target_page, parse_only=only_restable)
books = page.find_all('td',class_="res2")
for book in books:
all_books.append(Book (book,country))
page.decompose()
for title in all_books:
title.export(country)
return
if __name__ == "__main__":
build_list(country_code_list,countries)
print("Completed.")

I guess I'll just list off some of the problems or possible improvements in no particular order:
Follow PEP 8.
Right now, you've got lots of variables and functions named using camel-case like setAuthor. That's not the conventional style for Python; Python would typically named that set_author (and published_country rather than PublishedCountry, etc.). You can even change the names of some of the things you're calling: for one, BeautifulSoup supports findAll for compatibility, but find_all is recommended.
Besides naming, PEP 8 also specifies a few other things; for example, you'd want to rewrite this:
if len(resultNumber) == 0 : return 0
as this:
if len(result_number) == 0:
return 0
or even taking into account the fact that empty lists are falsy:
if not result_number:
return 0
Pass a SoupStrainer to BeautifulSoup.
The information you're looking for is probably in only part of the document; you don't need to parse the whole thing into a tree. Pass a SoupStrainer as the parse_only argument to BeautifulSoup. This should reduce memory usage by discarding unnecessary parts early.
decompose the soup when you're done with it.
Python primarily uses reference counting, so removing all circular references (as decompose does) should let its primary mechanism for garbage collection, reference counting, free up a lot of memory. Python also has a semi-traditional garbage collector to deal with circular references, but reference counting is much faster.
Don't make Book.__init__ write things to disk.
In most cases, I wouldn't expect just creating an instance of a class to write something to disk. Remove the call to export; let the user call export if they want it to be put on the disk.
Stop holding on to so much data in memory.
You're accumulating all this data into a dictionary just to export it afterwards. The obvious thing to do to reduce memory is to dump it to disk as soon as possible. Your comment indicates that you're putting it in a dictionary to be flexible; but that doesn't mean you have to collect it all in a list: use a generator, yielding items as you scrape them. Then the user can iterate over it just like a list:
for book in scrape_books():
book.export()
…but with the advantage that at most one book will be kept in memory at a time.
Use the functions in os.path rather than munging paths yourself.
Your code right now is rather fragile when it comes to path names. If I accidentally removed the trailing slash from destinationDirectory, something unintended happens. Using os.path.join prevents that from happening and deals with cross-platform differences:
>>> os.path.join("/Users/robbie/Test/", "USA")
'/Users/robbie/Test/USA'
>>> os.path.join("/Users/robbie/Test", "USA") # still works!
'/Users/robbie/Test/USA'
>>> # or say we were on Windows:
>>> os.path.join(r"C:\Documents and Settings\robbie\Test", "USA")
'C:\\Documents and Settings\\robbie\\Test\\USA'
Abbreviate attrs={"class":...} to class_=....
BeautifulSoup 4.1.2 introduces searching with class_, which removes the need for the verbose attrs={"class":...}.
I imagine there are even more things you can change, but that's quite a few to start with.

What do you want the booklist for, in the end? You should export each book at the end of the "for url in range" block (inside it), and do without the allbooks dict. If you really need a list, define exactly what infos you will need, not keeping full Book objects.

Adding external information to ParseResults before return

I want to add external information to ParseResults before return. I return the results of parsing as asXML(). The external data represented as dictionary so as to parsed as XML in the final parsing.
This the code before adding external data
from pyparsing import *
# a hypothetical outer parser, with an unparsed SkipTo element
color = oneOf("red orange yellow green blue purple")
expression = SkipTo("XXX") + Literal("XXX").setResultsName('ex') + color.setResultsName('color')
data = "JUNK 100 200 10 XXX green"
print expression.parseString(data).dump()
# main grammar
def minorgrammar(toks):
# a simple inner grammar
integer = Word(nums)
grammar2 = integer("A").setResultsName('A') + integer("B").setResultsName('B') + integer("C").setResultsName('C')
# use scanString to find the inner grammar
# (since we just want the first occurrence, we can use next
# instead of a for loop with a break)
t,s,e = next(grammar2.scanString(toks[0],maxMatches=1))
# remove 0'th element from toks
del toks[0]
# return a new ParseResults, the sum of t and everything
# in toks after toks[0] was removed
return t + toks
grammar1 = expression.setParseAction(minorgrammar)
x = grammar1.parseString(data).asXML("main")
print x
the output is
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
</main>
the code after adding external data
...
external_data = {'name':'omar', 'age':'40'}
return t + toks + ParseResults(external_data)
grammar1 = expression.setParseAction(minorgrammar)
x = grammar1.parseString(data).asXML("main")
print x
the output
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<ITEM>{&apos;age&apos;: &apos;40&apos;, &apos;name&apos;: &apos;omar&apos;}</ITEM>
</main>
I want the output in the form
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<name>omar</name>
<age>40</age>
</main>
What is the error in that code ? Thans

One problem is this fragment:
external_data = {'name':'omar', 'age':'40'}
return t + toks + ParseResults(external_data)
ParseResults will take a dict as a constructor argument, but I don't think it will do what you want - it just assigns the dict as it's 0'th element, and does not assign any results names.
You can assign named values into a ParseResults by using its dict-style assignment:
pr = ParseResults(['omar','40'])
for k,v in external_data.items():
pr[k] = v
See if this gets you closer to your desired format.
EDIT: Hmm, it seems asXML is more fussy about how named results get added to the ParseResults, than just setting the name. This will work better:
def addNamedResult(pr, value, name):
addpr = ParseResults([value])
addpr[name] = value
pr += addpr
And then in your parse action, add the values with their names using:
addNamedResult(toks, 'omar', 'name')
addNamedResult(toks, '40', 'age')

Thanks very much Paul. I modified your function to add a dictionary of data
...
external_data = {'name':'omar', 'age':'40'}
return t + toks + addDicResult(external_data)
...
def addDicResult(dict):
pr = ParseResults([])
for k, v in dict.items():
addpr = ParseResults([v])
addpr[k] = v
pr += addpr
return pr
The output
<main>
<A>100</A>
<B>200</B>
<C>10</C>
<ex>XXX</ex>
<color>green</color>
<age>40</age>
<name>omar</name>
</main>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

parse a function with pyparsing - python

Related

Extract Text from a word document

using another WS as validation Flask/Rest/Mysql

Concatenating strings and referencing

Reduce RAM usage in Python script

Adding external information to ParseResults before return

Categories

Resources