Parsing named nested expressions with pyparsing - python

I'm trying to parse some data using pyparsing that looks (more or less) like this:
User.Name = Dave
User.Age = 42
Date = 2015/01/01
Begin Component List
Begin Component 2
1 some data = a value
2 another key = 999
End Component 2
Begin Another Component
a.key = 42
End Another Component
End Component List
Begin MoreData
Another = KeyPair
End MoreData
I've found some similar examples, but I've not done very well for myself.
parsing file with curley brakets
Parse line data until keyword with pyparsing
Here's what I have so far, but I keep hitting an error similar to: pyparsing.ParseException: Expected "End" (at char 26), (line:5, col:1)
from pyparsing import *
data = '''Begin A
hello
world
End A
'''
opener = Literal('Begin') + Word(alphas)
closer = Literal('End') + Word(alphas)
content = Combine(OneOrMore(~opener
+ ~closer
+ CharsNotIn('\n', exact=1)))
expr = nestedExpr(opener=opener, closer=closer, content=content)
parser = expr
res = parser.parseString(data)
print(res)
It's important the the words after "Begin" are captured, as these are the names of the dictionaries, as well as the key-value pairs. Where there is a number after the opener, e.g. "Begin Component 2" the "2" is the number of pairs which I don't need (presumably this is used by the original software?). Similarly, I don't need the numbers in the list (the "1" and "2").
Is nestedExpr the correct approach to this?

Related

Regex Python find everything between four characters

I have a string that holds data. And I want everything in between ({ and })
"({Simple Data})"
Should return "Simple Data"
Or regex:
s = '({Simple Data})'
print(re.search('\({([^})]+)', s).group(1))
Output:
'Simple Data'
You could try the following:
^\({(.*)}\)$
Group 1 will contain Simple Data.
See an example on regexr.
If the brackets are always positioned at the beginning and the end of the string, then you can do this:
l = "({Simple Data})"
print(l[2:-2])
Which resulst in:
"Simple Data"
In Python you can access single characters via the [] operator. With this you can access the sequence of characters starting with the third one (index = 2) up to the second-to-last (index = -2, second-to-last is not included in the sequence).
You could try this regex (?s)\(\{(.*?)\}\)
which simply captures the contents between the delimiters.
Beware though, this doesn't account for nesting.
If nesting is a concern, the best you can to with standard Python re engine
is to get the inner nest only, using this regex:
\(\{((?:(?!\(\{|\}\).)*)\}\)
Hereby I designed a tokenizer aimming at nesting data. OP should check out here.
import collections
import re
Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])
def tokenize(code):
token_specification = [
('DATA', r'[ \t]*[\w]+[\w \t]*'),
('SKIP', r'[ \t\f\v]+'),
('NEWLINE', r'\n|\r\n'),
('BOUND_L', r'\(\{'),
('BOUND_R', r'\}\)'),
('MISMATCH', r'.'),
]
tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
line_num = 1
line_start = 0
lines = code.splitlines()
for mo in re.finditer(tok_regex, code):
kind = mo.lastgroup
value = mo.group(kind)
if kind == 'NEWLINE':
line_start = mo.end()
line_num += 1
elif kind == 'SKIP':
pass
else:
column = mo.start() - line_start
yield Token(kind, value, line_num, column)
statements = '''
({Simple Data})
({
Parent Data Prefix
({Nested Data (Yes)})
Parent Data Suffix
})
'''
queue = collections.deque()
for token in tokenize(statements):
if token.typ == 'DATA' or token.typ == 'MISMATCH':
queue.append(token.value)
elif token.typ == 'BOUND_L' or token.typ == 'BOUND_R':
print(''.join(queue))
queue.clear()
Output of this code should be:
Simple Data
Parent Data Prefix
Nested Data (Yes)
Parent Data Suffix

Pyparsing: Parsing semi-JSON nested plaintext data to a list

I have a bunch of nested data in a format that loosely resembles JSON:
company="My Company"
phone="555-5555"
people=
{
person=
{
name="Bob"
location="Seattle"
settings=
{
size=1
color="red"
}
}
person=
{
name="Joe"
location="Seattle"
settings=
{
size=2
color="blue"
}
}
}
places=
{
...
}
There are many different parameters with varying levels of depth--this is just a very small subset.
It also might be worth noting that when a new sub-array is created that there is always an equals sign followed by a line break followed by the open bracket (as seen above).
Is there any simple looping or recursion technique for converting this data to a system-friendly data format such as arrays or JSON? I want to avoid hard-coding the names of properties. I am looking for something that will work in Python, Java, or PHP. Pseudo-code is fine, too.
I appreciate any help.
EDIT: I discovered the Pyparsing library for Python and it looks like it could be a big help. I can't find any examples for how to use Pyparsing to parse nested structures of unknown depth. Can anyone shed light on Pyparsing in terms of the data I described above?
EDIT 2: Okay, here is a working solution in Pyparsing:
def parse_file(fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define the elements of our data pattern
name = Word(alphas, alphanums+"_")
EQ,LBRACE,RBRACE = map(Suppress, "={}")
value = Forward() #this tells pyparsing that values can be recursive
entry = Group(name + EQ + value) #this is the basic name-value pair
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
#declare the overall structure of a nested data element
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE) #we will turn the output into a Dictionary
#declare the types that might be contained in our data value - string, real, int, or the struct we declared
value << (quotedString | struct | real | integer)
#parse our input text and return it as a Dictionary
result = Dict(OneOrMore(entry)).parseString(inputText)
return result.dump()
This works, but when I try to write the results to a file with json.dump(result), the contents of the file are wrapped in double quotes. Also, there are \n chraacters between many of the data pairs. I tried suppressing them in the code above with LineEnd().suppress() , but I must not be using it correctly.
Parsing an arbitrarily nested structure can be done with pyparsing by defining a placeholder to hold the nested part, using the Forward class. In this case, you are just parsing simple name-value pairs, where then value could itself be a nested structure containing name-value pairs.
name :: word of alphanumeric characters
entry :: name '=' value
struct :: '{' entry* '}'
value :: real | integer | quotedstring | struct
This translates to pyparsing almost verbatim. To define value, which can recursively contain values, we first create a Forward() placeholder, which can be used as part of the definition of entry. Then once we have defined all the possible types of values, we use the '<<' operator to insert this definition into the value expression:
EQ,LBRACE,RBRACE = map(Suppress,"={}")
name = Word(alphas, alphanums+"_")
value = Forward()
entry = Group(name + EQ + value)
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
struct = Group(LBRACE + ZeroOrMore(entry) + RBRACE)
value << (quotedString | struct | real | integer)
The parse actions on real and integer will convert these elements from strings to float or ints at parse time, so that the values can be used as their actual types immediately after parsing (no need to post-process to do string-to-other-type conversion).
Your sample is a collection of one or more entries, so we use that to parse the total input:
result = OneOrMore(entry).parseString(sample)
We can access the parsed data as a nested list, but it is not so pretty to display. This code uses pprint to pretty-print a formatted nested list:
from pprint import pprint
pprint(result.asList())
Giving:
[['company', 'My Company'],
['phone', '555-5555'],
['people',
[['person',
[['name', 'Bob'],
['location', 'Seattle'],
['settings', [['size', 1], ['color', 'red']]]]],
['person',
[['name', 'Joe'],
['location', 'Seattle'],
['settings', [['size', 2], ['color', 'blue']]]]]]]]
Notice that all the strings are just strings with no enclosing quotation marks, and the ints are actual ints.
We can do just a little better than this, by recognizing that the entry format actually defines a name-value pair suitable for accessing like a Python dict. Our parser can do this with just a few minor changes:
Change the struct definition to:
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE)
and the overall parser to:
result = Dict(OneOrMore(entry)).parseString(sample)
The Dict class treats the parsed contents as a name followed by a value, which can be done recursively. With these changes, we can now access the data in result like elements in a dict:
print result['phone']
or like attributes in an object:
print result.company
Use the dump() method to view the contents of a structure or substructure:
for person in result.people:
print person.dump()
print
prints:
['person', ['name', 'Bob'], ['location', 'Seattle'], ['settings', ['size', 1], ['color', 'red']]]
- location: Seattle
- name: Bob
- settings: [['size', 1], ['color', 'red']]
- color: red
- size: 1
['person', ['name', 'Joe'], ['location', 'Seattle'], ['settings', ['size', 2], ['color', 'blue']]]
- location: Seattle
- name: Joe
- settings: [['size', 2], ['color', 'blue']]
- color: blue
- size: 2
There is no "simple" way, but there are harder and not-so-hard ways. If you don't want to hardcode things, then at some point you're going to have to parse it as a structured format. That would involve parsing each line one-by-one, tokenizing it appropriately (for example, separating the key from the value correctly), and then determining how you want to deal with the line.
You may need to store your data in an intermediary format such as a (parse) tree in order to account for the arbitrary nesting relationships (represented by indents and braces), and then after you have finished parsing the data, take your resulting tree and then go through it again to get your arrays or JSON.
There are libraries available such as ANTLR that handles some of the manual work of figuring out how to write the parser.
Take a look at this code:
still_not_valid_json = re.sub (r'(\w+)=', r'"\1":', pseudo_json ) #1
this_one_is_tricky = re.compile ('("|\d)\n(?!\s+})', re.M)
that_one_is_tricky_too = re.compile ('(})\n(?=\s+\")', re.M)
nearly_valid_json = this_one_is_tricky.sub (r'\1,\n', still_not_valid_json) #2
nearly_valid_json = that_one_is_tricky_too.sub (r'\1,\n', nearly_valid_json) #3
valid_json = '{' + nearly_valid_json + '}' #4
You can convert your pseudo_json in parseable json via some substitutions.
Replace '=' with ':'
Add missing commas between simple value (like "2" or "Joe") and next field
Add missing commas between closing brace of a complex value and next field
Embrace it with braces
Still there are issues. In your example 'people' dictionary contains two similar keys 'person'. After parsing only one key remains in the dictionary. This is what I've got after parsing:{u'phone': u'555-5555', u'company': u'My Company', u'people': {u'person': {u'settings': {u'color': u'blue', u'size': 2}, u'name': u'Joe', u'location': u'Seattle'}}}
If only you could replace second occurence of 'person=' to 'person1=' and so on...
Replace the '=' with ':', Then just read it as json, add in trailing commas
Okay, I came up with a final solution that actually transforms this data into a JSON-friendly Dict as I originally wanted. It first using Pyparsing to convert the data into a series of nested lists and then loops through the list and transforms it into JSON. This allows me to overcome the issue where Pyparsing's toDict() method was not able to handle where the same object has two properties of the same name. To determine whether a list is a plain list or a property/value pair, the prependPropertyToken method adds the string __property__ in front of property names when Pyparsing detects them.
def parse_file(self,fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
yes = CaselessKeyword("yes").setParseAction(replaceWith(True))
no = CaselessKeyword("no").setParseAction(replaceWith(False))
quotedString.setParseAction(removeQuotes)
unquotedString = Word(alphanums+"_-?\"")
comment = Suppress("#") + Suppress(restOfLine)
EQ,LBRACE,RBRACE = map(Suppress, "={}")
data = (real | integer | yes | no | quotedString | unquotedString)
#define structures
value = Forward()
object = Forward()
dataList = Group(OneOrMore(data))
simpleArray = (LBRACE + dataList + RBRACE)
propertyName = Word(alphanums+"_-.").setParseAction(self.prependPropertyToken)
property = dictOf(propertyName + EQ, value)
properties = Dict(property)
object << (LBRACE + properties + RBRACE)
value << (data | object | simpleArray)
dataset = properties.ignore(comment)
#parse it
result = dataset.parseString(inputText)
#turn it into a JSON-like object
dict = self.convert_to_dict(result.asList())
return json.dumps(dict)
def convert_to_dict(self, inputList):
dict = {}
for item in inputList:
#determine the key and value to be inserted into the dict
dictval = None
key = None
if isinstance(item, list):
try:
key = item[0].replace("__property__","")
if isinstance(item[1], list):
try:
if item[1][0].startswith("__property__"):
dictval = self.convert_to_dict(item)
else:
dictval = item[1]
except AttributeError:
dictval = item[1]
else:
dictval = item[1]
except IndexError:
dictval = None
#determine whether to insert the value into the key or to merge the value with existing values at this key
if key:
if key in dict:
if isinstance(dict[key], list):
dict[key].append(dictval)
else:
old = dict[key]
new = [old]
new.append(dictval)
dict[key] = new
else:
dict[key] = dictval
return dict
def prependPropertyToken(self,t):
return "__property__" + t[0]

Search and replace with finding if L or R is a prefix or suffix or nothing

Please excuse me for posting this again, but I think I really screwed up my previous thread. Because of comment blocks only allow so many characters, I could not explain myself better, and I did not see a choice for replying so that I would have more room. So if nobody minds, let me try explaining everything that I need. Basically I need to flip the names of 3D objects that have a prefix or a suffix of "L" or "R" from:
1: "L" with "R",
2: "R" with "L", or
3: don't change.
This is for a script in Maya in order to duplicate selected objects and flip there names. I got the duplicating part down packed and now it is about trying to flip the names of the duplicated objects based on 5 possibilities. Starting with the first 2 prefixes, the duplicated objects need to start with either
"L_" or "R_", match case doesn't matter.
The next 2, the suffixes, need to be either:
"_L" or "_R" with a possible extra character "_", such as "Finger_L_001".
Now in a search on this forum, I think found something almost to what I am looking for. I copied the syntax and replaced the user's search characters with mine being "L_" and "L", just to see if it would work, but with only some expectation. Since I only know the basics of regular expressions, such as "L.*" will find L_Finger_001, I really do not understand this line of syntax below and why the second choice is not leaving it as L_Finger.
So maybe this is not what I need or is it? And can someone explain this? I tried searching for keywords such as (?P) and (?P\S+), but I did not find anything. So without further due, here is the syntax....
>>> x = re.sub(r'(?P<prefix>_L)(?P<key>\S+)(?(prefix)|L_)','\g<key>',"L_Finger")
>>> x
'L_Finger'
>>> x = re.sub(r'(?P<prefix>L_)?(?P<key>\S+)(?(prefix)|_L)','\g<key>',"L_anything")
>>> x
'Finger'
#
Updated 11\10\13 3:52 PM ET
Ok, so I have tweaked the code a bit, but I like where this is going. Actually, my original idea was to use dictionaries, but I could figure out how to search. By kobejohn steering me in the right direction with defining all possibilities, this is starting to make sense. Here is a WIP
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
prefix_l, prefix_r = 'L_', 'R_'
suffix_l, suffix_lIndex, suffix_r, suffix_rIndex = '_L', '_L_', '_R', '_R_'
prefix_replace = {prefix_l: prefix_r, prefix_r: prefix_l}
suffix_replace = {suffix_l: suffix_r, suffix_r: suffix_l}
suffixIndex_replace = {suffix_lIndex: suffix_rIndex, suffix_rIndex: suffix_lIndex}
results = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results[sample] = sample
# Handle prefixes
prefix = prefix_replace.get(sample[:2].upper())
if prefix :
result = prefix+sample[:2]
else :
#handle the suffixes
suffix_partition = sample.rpartition("_")
result = suffix_partition[0] if suffix_partition[2].isdigit() else sample
suffix = suffix_replace.get(result[-2:])
print("Before: %s --> After: %s"%(sample, suffix))
Ok, I guess multiple regular expressions is valid too. Here is a way using re's similar to the ones you found. It assumes real prefixes (nothing before the prefix) and pseudo-suffixes (anywhere except the first characters). Below that is a parsing solution with the same assumptions.
import re
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
re_with_subs = ((r'(?P<prefix>L_)(?P<poststring>\S+)',
r'R_\g<poststring>'),
(r'(?P<prefix>R_)(?P<poststring>\S+)',
r'L_\g<poststring>'),
(r'(?P<prestring>\S+)(?P<suffix>_L)(?P<poststring>\S*)',
r'\g<prestring>_R\g<poststring>'),
(r'(?P<prestring>\S+)(?P<suffix>_R)(?P<poststring>\S*)',
r'\g<prestring>_L\g<poststring>'))
results_re = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results_re[sample] = sample
for pattern, substitution in re_with_subs:
result = re.sub(pattern, substitution, sample)
if result != sample:
results_re[sample] = result
break # only allow one substitution per string
for original, result in results_re.items():
print('{0} --> {1}'.format(original, result))
Here is the parsing solution.
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
prefix_l, prefix_r = 'L_', 'R_'
suffix_l, suffix_r = '_L', '_R'
prefix_replacement = {prefix_l: prefix_r,
prefix_r: prefix_l}
suffix_replacement = {suffix_l: suffix_r,
suffix_r: suffix_l}
results = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results[sample] = sample
# Handle prefixes
prefix = sample[:2].upper()
try:
results[sample] = prefix_replacement[prefix] + sample[2:]
continue # assume no suffixes if a prefix found
except KeyError:
pass # no valid prefix
# Handle pseudo-suffixes
start = None
for valid_suffix in (suffix_l, suffix_r):
try:
start = sample.upper().rindex(valid_suffix, 1)
break # stop if valid suffix found
except ValueError:
pass
if not start is None:
suffix = sample[start: start + 2].upper()
new_suffix = suffix_replacement[suffix]
results[sample] = sample[:start] + new_suffix + sample[start + 2:]
for original, result in results.items():
print('{0} --> {1}'.format(original, result))
gives the result:
L_Arm --> R_Arm
R_Arm --> L_Arm
IndexFinger_L_001 --> IndexFinger_R_001
Arm_L --> Arm_R
_LArm --> _LArm
IndexFinger_R_001 --> IndexFinger_L_001
Arm_R --> Arm_L
You can do it by this tricky regex pattern >>
if re.search('(^[LR]_|_[LR](_|$))', str):
str = re.sub(r'(^[LR](?=_)|(?<=_)[LR](?=(?:_|...$)))(.*)(?=.*\1(.))...$',
r'\3\2', str+"LRL")
See this demo.
Alternatively, you can do it by one line code >>
str = re.sub(r'(^[LR](?=_)|(?<=_)[LR](?=(?:_|...$)))(.*)(?=.*\1(.))...$', r'\3\2', str+"LRL")[:len(str)]
See this demo.

Analysing a text file in Python

I have a text file that needs to be analysed. Each line in the file is of this form:
7:06:32 (slbfd) IN: "lq_viz_server" aqeela#nabltas1
7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS ) Albahraj#nabwmps3 (License server system does not support this feature. (-18,327))
7:08:21 (slbfd) OUT: "OFM32" Albahraj#nabwmps3
I need to skip the timestamp and the (slbfd) and only keep a count of the lines with the IN and OUT. Further, depending on the name in quotes, I need to increase a variable count for different variables if a line starts with OUT and decrease the variable count otherwise. How would I go about doing this in Python?
The other answers with regex and splitting the line will get the job done, but if you want a fully maintainable solution that will grow with you, you should build a grammar. I love pyparsing for this:
S ='''
7:06:32 (slbfd) IN: "lq_viz_server" aqeela#nabltas1
7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS ) Albahraj#nabwmps3 (License server system does not support this feature. (-18,327))
7:08:21 (slbfd) OUT: "OFM32" Albahraj#nabwmps3'''
from pyparsing import *
from collections import defaultdict
# Define the grammar
num = Word(nums)
marker = Literal(":").suppress()
timestamp = Group(num + marker + num + marker + num)
label = Literal("(slbfd)")
flag = Word(alphas)("flag") + marker
name = QuotedString(quoteChar='"')("name")
line = timestamp + label + flag + name + restOfLine
grammar = OneOrMore(Group(line))
# Now parsing is a piece of cake!
P = grammar.parseString(S)
counts = defaultdict(int)
for x in P:
if x.flag=="IN": counts[x.name] += 1
if x.flag=="OUT": counts[x.name] -= 1
for key in counts:
print key, counts[key]
This gives as output:
lq_viz_server 1
OFM32 -1
Which would look more impressive if your sample log file was longer. The beauty of a pyparsing solution is the ability to adapt to a more complex query in the future (ex. grab and parse the timestamp, pull email address, parse error codes...). The idea is that you write the grammar independent of the query - you simply convert the raw text to a computer friendly format, abstracting away the parsing implementation away from it's usage.
If I consider that the file is divided into lines (I don't know if it's true) you have to apply split() function to each line. You will have this:
["7:06:32", "(slbfd)", "IN:", "lq_viz_server", "aqeela#nabltas1"]
And then I think you have to be capable of apply any logic comparing the values that you need.
i made some wild assumptions about your specification and here is a sample code to help you start:
objects = {}
with open("data.txt") as data:
for line in data:
if "IN:" in line or "OUT:" in line:
try:
name = line.split("\"")[1]
except IndexError:
print("No double quoted name on line: {}".format(line))
name = "PARSING_ERRORS"
if "OUT:" in line:
diff = 1
else:
diff = -1
try:
objects[name] += diff
except KeyError:
objects[name] = diff
print(objects) # for debug only, not advisable to print huge number of names
You have two options:
Use the .split() function of the string (as pointed out in the comments)
Use the re module for regular expressions.
I would suggest using the re module and create a pattern with named groups.
Recipe:
first create a pattern with re.compile() containing named groups
do a for loop over the file to get the lines use .match() od the
created pattern object on each line use .groupdict() of the
returned match object to access your values of interest
In the mode of just get 'er done with the standard distribution, this works:
import re
from collections import Counter
# open your file as inF...
count=Counter()
for line in inF:
match=re.match(r'\d+:\d+:\d+ \(slbfd\) (\w+): "(\w+)"', line)
if match:
if match.group(1) == 'IN': count[match.group(2)]+=1
elif match.group(1) == 'OUT': count[match.group(2)]-=1
print(count)
Prints:
Counter({'lq_viz_server': 1, 'OFM32': -1})

Pyparsing: How can I parse data and then edit a specific value in a .txt file?

my data is located in a .txt file (no, I can't change it to a different format) and it looks like this:
varaiablename = value
something = thisvalue
youget = the_idea
Here is my code so far (taken from the examples in Pyparsing):
from pyparsing import Word, alphas, alphanums, Literal, restOfLine, OneOrMore, \
empty, Suppress, replaceWith
input = open("text.txt", "r")
src = input.read()
# simple grammar to match #define's
ident = Word(alphas + alphanums + "_")
macroDef = ident.setResultsName("name") + "= " + ident.setResultsName("value") + Literal("#") + restOfLine.setResultsName("desc")
for t,s,e in macroDef.scanString(src):
print t.name,"=", t.value
So how can I tell my script to edit a specific value for a specific variable?
Example:
I want to change the value of variablename, from value to new_value.
So essentially variable = (the data we want to edit).
I probably should make it clear that I don't want to go directly into the file and change the value by changing value to new_value but I want to parse the data, find the variable and then give it a new value.
Even though you have already selected another answer, let me answer your original question, which was how to do this using pyparsing.
If you are trying to make selective changes in some body of text, then transformString is a better choice than scanString (although scanString or searchString are fine for validating your grammar expression by looking for matching text). transformString will apply token suppression or parse action modifications to your input string as it scans through the text looking for matches.
# alphas + alphanums is unnecessary, since alphanums includes all alphas
ident = Word(alphanums + "_")
# I find this shorthand form of setResultsName is a little more readable
macroDef = ident("name") + "=" + ident("value")
# define values to be updated, and their new values
valuesToUpdate = {
"variablename" : "new_value"
}
# define a parse action to apply value updates, and attach to macroDef
def updateSelectedDefinitions(tokens):
if tokens.name in valuesToUpdate:
newval = valuesToUpdate[tokens.name]
return "%s = %s" % (tokens.name, newval)
else:
raise ParseException("no update defined for this definition")
macroDef.setParseAction(updateSelectedDefinitions)
# now let transformString do all the work!
print macroDef.transformString(src)
Gives:
variablename = new_value
something = thisvalue
youget = the_idea
For this task you do not need to use special utility or module
What you need is reading lines and spliting them in list, so first index is left and second index is right side.
If you need these values later you might want to store them in dictionary.
Well here is simple way, for somebody new in python. Uncomment lines whit print to use it as debug.
f=open("conf.txt","r")
txt=f.read() #all text is in txt
f.close()
fwrite=open("modified.txt","w")
splitedlines = txt.splitlines():
#print splitedlines
for line in splitedlines:
#print line
conf = line.split('=')
#conf[0] is what it is on left and conf[1] is what it is on right
#print conf
if conf[0] == "youget":
#we get this
conf[1] = "the_super_idea" #the_idea is now the_super_idea
#join conf whit '=' and write
newline = '='.join(conf)
#print newline
fwrite.write(newline+"\n")
fwrite.close()
Actually, you should have a look at the config parser module
Which parses exactly your syntax (you need only to add [section] at the beginning).
If you insist on your implementation, you can create a dictionary :
dictt = {}
for t,s,e in macroDef.scanString(src):
dictt[t.name]= t.value
dictt[variable]=new_value
ConfigParser
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('example.txt')
variablename = config.get('variablename', 'float')
It'll yell at you if you don't have a [section] header, though, but it's ok, you can fake one.

Categories