Is there a simple string parser for Python?

Is there a simple string parser for Python? - python

I have a string like that: 'aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'. From that string I want to get the following structure: ['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']. In other words I would like to separate substrings that are in brackets of different types.

You need to use a stack approach to track nesting levels:
pairs = {'{': '}', '[': ']', '(': ')'}
def parse_groups(string):
stack = []
last = 0
for i, c in enumerate(string):
if c in pairs:
# push onto the stack when we find an opener
if not stack and last < i:
# yield anything *not* grouped
yield string[last:i]
stack.append((c, i))
elif c in pairs:
if stack and pairs[stack[-1][0]] == c:
# Found a closing bracket, pop the stack
start = stack.pop()[1]
if not stack:
# Group fully closed, yield
yield string[start:i + 1]
last = i + 1
else:
raise ValueError('Missing opening parethesis')
if stack:
raise ValueError('Missing closing parethesis')
if last < len(string):
# yield the tail
yield string[last:]
This will generate groups, cast to a list if you need one:
>>> list(parse_groups('aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'))
['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']
If the brackets / parenthesis do not balance, an exception is raised:
>>> list(parse_groups('aa(bb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 19, in parse_groups
ValueError: Missing closing parethesis
>>> list(parse_groups('aa[{bb}}'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis
>>> list(parse_groups('aa)bb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis

You could also look at pyparsing. Interestingly, this can be implemented as a stack, where you can push string fragments when you find {[( and pop when you find )]}.

I think you could try Custom String Parser library (I'm the author of it). It's designed to work with data which has any logical structure, so you can customize it the way you want ;)

Related

Strange error in Python (IndexError)

Sorry if this is a dumb question but I am having a few issues with my code.
I have a Python script that scrapes Reddit and sets the top picture as my desktop background.
I want it to only download if the picture is big enough, but I am getting a strange error.
>>> m = '1080x608'
>>> w = m.rsplit('x', 1)[0]
>>> print(w)
1080
>>> h = m.rsplit('x', 1)[1]
>>> print(h)
608
This works fine, but the following doesn't, despite being almost the same.
>>> m = '1280×721'
>>> w = m.rsplit('x', 1)[0]
>>> h = m.rsplit('x', 1)[1]
Traceback (most recent call last):
File "<pyshell#35>", line 1, in <module>
h = m.rsplit('x', 1)[1]
IndexError: list index out of range

In your second example × is not the same as x, it is instead a multiplication sign. If you are getting these stings from somewhere and then parsing them, you should first do
m = m.replace('×', 'x')

× != x. Split returns one-element list, and you are trying to retrieve second element from it.
'1080x608'.rsplit('x', 1) # ['1080', '608']
'1280×721'.rsplit('x', 1) # ['1280\xc3\x97721']
In second case there is no second element in list - it contains only one element.
MCVE would be:
l = ['something']
l[1]
With exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
To ensure that you split string to two parts you may use a partition.
Split the string at the first occurrence of sep, and return a 3-tuple
containing the part before the separator, the separator itself, and
the part after the separator. If the separator is not found, return a
3-tuple containing the string itself, followed by two empty strings.
w, sep, h = m.partition('x')
# h and sep will be empty if there is no separator in m

python regular expression error

I am trying do a pattern search and if match then set a bitarray on the counter value.
runOutput = device[router].execute (cmd)
runOutput = output.split('\n')
print(runOutput)
for this_line,counter in enumerate(runOutput):
print(counter)
if re.search(r'dev_router', this_line) :
#want to use the counter to set something
Getting the following error:
if re.search(r'dev_router', this_line) :
2016-07-15T16:27:13: %ERROR: File
"/auto/pysw/cel55/python/3.4.1/lib/python3.4/re.py", line 166,
in search 2016-07-15T16:27:13: %-ERROR: return _compile(pattern,
flags).search(string)
2016-07-15T16:27:13: %-ERROR: TypeError: expected string or buffer

You mixed up the arguments for enumerate() - first goes the index, then the item itself. Replace:
for this_line,counter in enumerate(runOutput):
with:
for counter, this_line in enumerate(runOutput):
You are getting a TypeError in this case because this_line is an integer and re.search() expects a string as a second argument. To demonstrate:
>>> import re
>>>
>>> this_line = 0
>>> re.search(r'dev_router', this_line)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
By the way, modern IDEs like PyCharm can detect this kind of problems statically:
(Python 3.5 is used for this screenshot)

Python doesn't allow me to do match.group() with regex?

I wrote a regex in Python to just get the digits from a string. However, when I run match.group(), it says that the object list has no attribute group. What am I doing wrong? My code as typed pasted into the terminal, and the terminal's response. Thanks.
>>> #import regex library
... import re
>>>
>>> #use a regex to just get the numbers -- not the rest of the string
... matcht = re.findall(r'\d', dwtunl)
>>> matchb = re.findall(r'\d', ambaas)
>>> #macht = re.search(r'\d\d', dwtunl)
...
>>> #just a test to see about my regex
... print matcht.group()
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> print matchb.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'group'
>>>
>>> #start defining the final variables
... if dwtunl == "No Delay":
... dwtunnl = 10
... else:
... dwtunnl = matcht.group()
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> if ambaas == "No Delay":
... ammbaas = 10
... else:
... ammbaas = matchb.group()
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
AttributeError: 'list' object has no attribute 'group'

re.findall() doesn't return a match object (or a list of them), it always returns a list of strings (or a list of tuples of strings, in the case of there being more than one capturing group). And a list doesn't have a .group() method.
>>> import re
>>> regex = re.compile(r"(\w)(\W)")
>>> regex.findall("A/1$5&")
[('A', '/'), ('1', '$'), ('5', '&')]
re.finditer() will return an iterator that yields one match object per match.
>>> for match in regex.finditer("A/1$5&"):
... print match.group(1), match.group(2)
...
A /
1 $
5 &

Because re.findall(r'\d', ambaas) returns a list. You can iterate over the list, like:
for i in stored_list
Or just stored_list[0].

re.findall() returns a list of strings, not a match object. You might mean re.finditer.

Shorthand for defining PyParsing field names

I have a few pyparsing tokens defined as follows:
field = Word(alphas + "_").setName("field")
Is there really no shorthand for this?
Furthermore, this does not seem to work, the dictionary returned by expression.parseString() is always an empty one.

You are confusing setName and setResultsName. setName assigns a name to the expression so that exception messages are more meaningful. Compare:
>>> integer1 = Word(nums)
>>> integer1.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected W:(0123...) (at char 0), (line:1, col:1)
and:
>>> integer2 = Word(nums).setName("integer")
>>> integer2.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected integer (at char 0), (line:1, col:1)
setName gives a name to the expression itself.
setResultsName on the other hand gives a name to the parsed data that is returned, like named fields in a regex.
>>> expr = integer.setResultsName('age') + integer.setResultsName('credits')
>>> data = expr.parseString('20 110')
>>> print data.dump()
['20', '110']
- age: 20
- credits: 110
And as #Kimvais has mentioned, there is a shortcut for setResultsName:
>>> expr = integer('age') + integer('credits')
Note also that setResultsName returns a copy of the expression - that is the only way that using the same expression multiple times with different names works.

field = Word(alphas + "_")("field")
seems to work.

Python: needs more than 1 value to unpack

What am I doing wrong to get this error?
replacements = {}
replacements["**"] = ("<strong>", "</strong>")
replacements["__"] = ("<em>", "</em>")
replacements["--"] = ("<blink>", "</blink>")
replacements["=="] = ("<marquee>", "</marquee>")
replacements["##"] = ("<code>", "</code>")
for delimiter, (open_tag, close_tag) in replacements: # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag);
The error:
Traceback (most recent call last):
File "", line 1, in
for doot, (a, b) in replacements: ValueError: need more than 1 value to
unpack
All the values tuples have two values. Right?

It should be:
for delimiter, (open_tag, close_tag) in replacements.iteritems(): # or .items() in py3k

I think you need to call .items() like the third example in this link
for delimiter, (open_tag, close_tag) in replacements.items(): # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a simple string parser for Python? - python

I have a string like that: 'aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'. From that string I want to get the following structure: ['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']. In other words I would like to separate substrings that are in brackets of different types.

You could also look at pyparsing. Interestingly, this can be implemented as a stack, where you can push string fragments when you find {[( and pop when you find )]}.

I think you could try Custom String Parser library (I'm the author of it). It's designed to work with data which has any logical structure, so you can customize it the way you want ;)

Related

Strange error in Python (IndexError)

python regular expression error

Python doesn't allow me to do match.group() with regex?

Shorthand for defining PyParsing field names

Python: needs more than 1 value to unpack

Categories

Resources