I have a few pyparsing tokens defined as follows:
field = Word(alphas + "_").setName("field")
Is there really no shorthand for this?
Furthermore, this does not seem to work, the dictionary returned by expression.parseString() is always an empty one.
You are confusing setName and setResultsName. setName assigns a name to the expression so that exception messages are more meaningful. Compare:
>>> integer1 = Word(nums)
>>> integer1.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected W:(0123...) (at char 0), (line:1, col:1)
and:
>>> integer2 = Word(nums).setName("integer")
>>> integer2.parseString('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected integer (at char 0), (line:1, col:1)
setName gives a name to the expression itself.
setResultsName on the other hand gives a name to the parsed data that is returned, like named fields in a regex.
>>> expr = integer.setResultsName('age') + integer.setResultsName('credits')
>>> data = expr.parseString('20 110')
>>> print data.dump()
['20', '110']
- age: 20
- credits: 110
And as #Kimvais has mentioned, there is a shortcut for setResultsName:
>>> expr = integer('age') + integer('credits')
Note also that setResultsName returns a copy of the expression - that is the only way that using the same expression multiple times with different names works.
field = Word(alphas + "_")("field")
seems to work.
Related
I am trying do a pattern search and if match then set a bitarray on the counter value.
runOutput = device[router].execute (cmd)
runOutput = output.split('\n')
print(runOutput)
for this_line,counter in enumerate(runOutput):
print(counter)
if re.search(r'dev_router', this_line) :
#want to use the counter to set something
Getting the following error:
if re.search(r'dev_router', this_line) :
2016-07-15T16:27:13: %ERROR: File
"/auto/pysw/cel55/python/3.4.1/lib/python3.4/re.py", line 166,
in search 2016-07-15T16:27:13: %-ERROR: return _compile(pattern,
flags).search(string)
2016-07-15T16:27:13: %-ERROR: TypeError: expected string or buffer
You mixed up the arguments for enumerate() - first goes the index, then the item itself. Replace:
for this_line,counter in enumerate(runOutput):
with:
for counter, this_line in enumerate(runOutput):
You are getting a TypeError in this case because this_line is an integer and re.search() expects a string as a second argument. To demonstrate:
>>> import re
>>>
>>> this_line = 0
>>> re.search(r'dev_router', this_line)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
By the way, modern IDEs like PyCharm can detect this kind of problems statically:
(Python 3.5 is used for this screenshot)
I have written following script to get rid of non-alphanumeric characters, and get them back afterwards. However I can't seem to figure out why the unhexlify will not work. Any suggestions?
import binascii, timeit, re
damn_string = "asjke5234nlkfs$sfj3.$sfjk."
def convert_string(s):
return ''.join('__UTF%s__' % binascii.hexlify(c.encode('utf-16')) if not c.isalnum() else c for c in s.lower())
def convert_back(s):
for i in re.findall('__UTF([a-f0-9]{8})__', s): # For testing
print binascii.unhexlify(i).decode('utf-16')
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
convert = convert_string(damn_string)
print convert
print convert_back(convert)
result in the following output:
asjke5234nlkfs__UTFfffe2400__sfj3__UTFfffe2e00____UTFfffe2400__sfjk__UTFfffe2e00__
$
.
$
.
Traceback (most recent call last):
File "test.py", line 131, in <module>
print convert_back(convert)
File "test.py", line 127, in convert_back
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
TypeError: Odd-length string
My bad. I took me a bit too long to realize that re.sub cannot submit the group string in this manner. One way of doing this is:
return re.sub('__UTF([a-f0-9]{8})__', lambda x: binascii.unhexlify(x.group(1)).decode('utf-16'), s)
I wrote a regex in Python to just get the digits from a string. However, when I run match.group(), it says that the object list has no attribute group. What am I doing wrong? My code as typed pasted into the terminal, and the terminal's response. Thanks.
>>> #import regex library
... import re
>>>
>>> #use a regex to just get the numbers -- not the rest of the string
... matcht = re.findall(r'\d', dwtunl)
>>> matchb = re.findall(r'\d', ambaas)
>>> #macht = re.search(r'\d\d', dwtunl)
...
>>> #just a test to see about my regex
... print matcht.group()
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> print matchb.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'group'
>>>
>>> #start defining the final variables
... if dwtunl == "No Delay":
... dwtunnl = 10
... else:
... dwtunnl = matcht.group()
...
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> if ambaas == "No Delay":
... ammbaas = 10
... else:
... ammbaas = matchb.group()
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
AttributeError: 'list' object has no attribute 'group'
re.findall() doesn't return a match object (or a list of them), it always returns a list of strings (or a list of tuples of strings, in the case of there being more than one capturing group). And a list doesn't have a .group() method.
>>> import re
>>> regex = re.compile(r"(\w)(\W)")
>>> regex.findall("A/1$5&")
[('A', '/'), ('1', '$'), ('5', '&')]
re.finditer() will return an iterator that yields one match object per match.
>>> for match in regex.finditer("A/1$5&"):
... print match.group(1), match.group(2)
...
A /
1 $
5 &
Because re.findall(r'\d', ambaas) returns a list. You can iterate over the list, like:
for i in stored_list
Or just stored_list[0].
re.findall() returns a list of strings, not a match object. You might mean re.finditer.
i have the following code in my python script, to launch an application and grab the output of it.
An example of this output would be 'confirmed : 0'
Now i only want to know the number, in this case zero, but normally this number is float, like 0.005464
When i run this code it tells me it cannot convert "0" to float. What am i doing wrong?
This is the error i get now:
ValueError: could not convert string to float: "0"
cmd = subprocess.Popen('/Applications/Electrum.app/Contents/MacOS/Electrum getbalance', shell=True, stdout=subprocess.PIPE)
for line in cmd.stdout:
if "confirmed" in line:
a,b=line.split(': ',1)
if float(b)>0:
print "Positive amount"
else:
print "Empty"
According to the exception you got, the value contained in b is not 0, but "0" (including the quotes), and therefore cannot be converted to a float directly. You'll need to remove the quotes first, e.g. with float(b.strip('"')).
As can be seen in the following examples, the exception description does not add the quotes, so they must have been part of the original string:
>>> float('"0"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: "0"
>>> float('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: a
I have tested the code and found that split(': ', 1) result contains string
>>> line = "1: 456: confirmed"
>>> "confirmed" in line
True
>>> a,b=line.split(': ', 1)
>>> a
'1'
>>> b
'456: confirmed'
I have a string like that: 'aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'. From that string I want to get the following structure: ['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']. In other words I would like to separate substrings that are in brackets of different types.
You need to use a stack approach to track nesting levels:
pairs = {'{': '}', '[': ']', '(': ')'}
def parse_groups(string):
stack = []
last = 0
for i, c in enumerate(string):
if c in pairs:
# push onto the stack when we find an opener
if not stack and last < i:
# yield anything *not* grouped
yield string[last:i]
stack.append((c, i))
elif c in pairs:
if stack and pairs[stack[-1][0]] == c:
# Found a closing bracket, pop the stack
start = stack.pop()[1]
if not stack:
# Group fully closed, yield
yield string[start:i + 1]
last = i + 1
else:
raise ValueError('Missing opening parethesis')
if stack:
raise ValueError('Missing closing parethesis')
if last < len(string):
# yield the tail
yield string[last:]
This will generate groups, cast to a list if you need one:
>>> list(parse_groups('aaa(cc(kkk)c)ddd[lll]{m(aa)mm}'))
['aaa', '(cc(kkk)c)', 'ddd', '[lll]', '{m(aa)mm}']
If the brackets / parenthesis do not balance, an exception is raised:
>>> list(parse_groups('aa(bb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 19, in parse_groups
ValueError: Missing closing parethesis
>>> list(parse_groups('aa[{bb}}'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis
>>> list(parse_groups('aa)bb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 20, in parse_groups
ValueError: Missing opening parethesis
You could also look at pyparsing. Interestingly, this can be implemented as a stack, where you can push string fragments when you find {[( and pop when you find )]}.
I think you could try Custom String Parser library (I'm the author of it). It's designed to work with data which has any logical structure, so you can customize it the way you want ;)