Python Regex groups causing index errors - python

So, my interpereter is complaining about IndexError: Replacement index 1 out of range for positional args tuple when calling re.group(#) or re.groups() under specific circumstances. It is meant to return a phone number, such as +1 (555) 555-5555
Here is the regex used, as it is declared elsewhere:
self.phoneRegex = re.compile(r'(\+\d) (\(\d\d\d\)) (\d\d\d)(\d\d\d\d)')
Here is the code causing the issues:
for cell in self.cells:
if '+1' in cell.text:
print(self.pmo.groups()) #Works fine
print("{} {} {}-{}".format(self.pmo.groups())) #Errors out.
print("{} {} {}-{}".format(self.pmo.group(1), self.pmo.group(2),self.pmo.group(3), self.pmo.group(4))) #Also errors out.
if isinstance(self.cursor(row=self.data['lst_row'], column=self.telCol).value, type(None)):
self.cursor(row=self.data['lst_row'], column=self.telCol).value = "{} {};".format("{} {} {}-{}".format(self.pmo.group(2), self.pmo.group(2),self.pmo.group(3), self.pmo.group(4)))
Full Traceback:
Traceback (most recent call last):
File "F:\Documents\Programs\Python\E45 Contact Info Puller\main.py", line 289, in run
print("{} {} {}-{}".format(self.pmo.groups()))
IndexError: Replacement index 1 out of range for positional args tuple

You have this string.format line:
print("{} {} {}-{}".format(self.pmo.groups()))
re match groups are tuples, so here, you have 4 format substitutions, but you're trying to pass a single tuple (that contains 4 matches per your regex) instead of 4 separate argument for formatting.
You need to unpack (or splat) the tuple for the string formatting - notice the * added before self.pmo.groups().
print("{} {} {}-{}".format(*self.pmo.groups()))

Related

Regex vs readline for text processing

I have a text to process (router output) and generate useful data structure (dictionary having keys as iface name and values as packet counts) from it. I have two approaches to do the same task. I would like to know which one should I use for efficiency and which one looks more prone to fail for bigger data samples.
Readline1 gets a list from readline and processes output and writes into the dictionary with key as interface name and values as next three items.
Readline2 uses re module and match the groups and from groups it writes to dictionary keys and values.
input self.output to these functions will be something like this:
message =
"""
Interface 1/1\n\t
input : 1234\n\t
output : 3456\n\t
dropped : 12\n
\n
Interface 1/2\n\t
input : 7123\n\t
output : 2345\n\t
dropped : 31\n\t
"""
def ReadLine1(self):
lines = self.output.splitlines()
for index, line in enumerate(lines):
if "Interface" in line:
valuelist = []
for i in [1,2,3]:
valuelist.append((lines[index+i].split(':'))[1].strip())
self.IFlist[line.split()[1]] = valuelist
return self.IFlist
def Readline2(self):
#print repr(self.output)
n = re.compile(r"\n*Interface (./.)\n\s*input : ([0-9]+)\n\s*output : ([0-9]+)\n\s*dropped : ([0-9]+)",re.MULTILINE|re.DOTALL)
blocks = self.output.split('\n\n')
for block in blocks:
m_object = re.match(n, block)
self.IFlist[m_object.group(1)] = [m_object.group(i) for i in (2,3,4)]
Both of your methods use specific aspects of the format to achieve the parsing you are trying to do, and if that format was changed / broken one of the methods could also break...
For example if you added a space in the empty line between the two entries (which you cannot see) then the blocks = self.output.split('\n\n') would fail to find two consecutive newline characters and the regex version would miss out on the second entry:
{'1/1': ['1234', '3456', '13']}
Or if you added an extra newline between input and output like this:
Interface 1/2
input : 7123
output : 2345
dropped : 31
The regex \s* would deal with the extra space fine but the non-regex parsing would assume that lines[index+i].split(':') has an indice [1] so it would raise an IndexError with that data
Or if you added some extra space at the end of any line then the regex would fail to see the newline right after the content and re.match(n, lock) would return None so the next line would raise an AttributeError: 'NoneType' object has no attribute 'group'
Or if you changed Interface to interface for one of the entries (no longer capital I) then the regex would raise the same error as above but the non-regex would simply ignore that entry.
While I was testing it I found that the regex was easier to mess up with small edits to the sample message, but I also found that the version I made using a generator expression and str.partition was significantly more robust then both of them:
def readline3():
gen_lines = (line for line in self.output.splitlines()
if line and not line.isspace())
try:
while True: #ended when next() throws a StopIteration
start,_,key = next(gen_lines).partition(" ")
if start == "Interface":
IFlist[key] = [next(gen_lines).rpartition(" : ")[2]
for _ in "123"]
except StopIteration: # reached end of output
return self.IFlist
This succeeded in every case mentioned above and a few more, and since the only method this is relying on is str.partition which alway returns a 3 item tuple there is nothing to raise any unexpected errors unless self.output is something other then a string.
Also running a benchmark using timeit your readline1 consistently was faster then readline2 and my readline3 was usually slightly more then readline1:
#using the default 1000000 loops using 'message'
<function readline1 at 0x100756f28>
11.225649802014232
<function readline2 at 0x1057e3950>
14.838601427007234
<function readline3 at 0x1057e39d8>
11.693351223017089

Trying to read in an external file into a dictionary [duplicate]

This question already has an answer here:
How to read in a file into a dictionary
(1 answer)
Closed 8 years ago.
OK I am trying to read in an external file into a dictionary however I am receiving some syntax errors. The clues which get read in then have to replace the letters which they pair with in the list of coded words
My code for reading into a dictionary and replacing the symbols is as follows.
d = {}
def read_clues(clues):
global d
with open("hey.txt") as f:
for line in f:
(key, val) = line[1], line[0]
d[key] = val
def replace_symbols(clues, words):
global d
for word in range(len(words)):
for key, value in d.items():
words[word] = words[word].replace(key, value)
In the main part of my program I have the code for calling the replace_symbols. However I am getting a syntax error after print key, in the last line. The code for this is shown below.
#REPLACES LETTERS
print("======== The clues have been replaced ===========")
replace_symbols(clues, words)
for key, value in d.items():
print key, value // This will print the symbols and letters
Assuming that hey.txt has the keys and values separated by a space, the following code should work:
def read_clues(clues):
global d
with open("hey.txt") as f:
for line in f:
stuff = line.split(" ") #split each line into parts
(key, val) = stuff[1], stuff[0]
d[key] = val
If the separator is other than a space, just include it as an argument to split().
There are some other problems in your code, but since you're asking about the syntax error, it's almost certainly this line:
print key, value // This will print the symbols and letters
First, // does not mean "comment" in Python, it means "integer division". So, you're asking it to divide value by This (which would probably raise a NameError, because it's unlikely you have anything named This in your code), and then including a bunch of other identifiers starting with will. A string of two identifiers in a row isn't valid syntax.
How do you write a comment in Python? Use #, not //:
print key, value # This will print the symbols and letters
Second, if you're using Python 3.x, print is a normal function, like anything else, so its arguments have to go in parentheses, like all of your other function calls. (And given the print call a few lines up, I'm willing to bet you are using Python 3.x.) Most likely you've copied this from some code for Python 2.x. There are some important differences between Python 2 and 3, which means that not all code for Python 2 can be copied and pasted into your Python 3. And this is one of the cases where it doesn't work. So:
print(key, value) # This will print the symbols and letters
But don't make that second change if you're using Python 2.x; otherwise, you'll just end up printing a tuple instead of two strings separated by a space. (For example, print 1, 2 prints 1 2, but print(1, 2) prints (1, 2).)

How to use "def" with strings

I'm relatively new to programming so I just recently got started with experimenting with "def" in python. This is my code and its keeps on telling me the first name hasn't been defined.
def name(first, last):
first = str(first)
last = str(last)
first = first.upper
last = last,upper
print("HELLO", first, last)
I then run the program and i write a name like
name(bob, robert)
and then it would tell me that "bob" hasn't been defined
You should quote them (using ' or ") if you mean string literals:
name('bob', 'robert')
Beside that, the code need a fix.
def name(first, last):
first = str(first)
last = str(last)
first = first.upper() # Append `()` to call `upper` method.
last = last.upper() # Replaced `,` with `.`.
print("HELLO", first, last)
There's a difference between a variable and a string. A variable is a slot in memory already allocated with a data (string, number, structure...) When you write robert without quotes, Python will search this variable already instancied with this name.
Here it doesn't exists, since you don't write robert = 'something'. If you want to pass a string directly, you just have to write it, surronding by quotes (or triple quotes if it's on multiple lines).
What you want to achieve is calling your name function like this:
def name(first, last):
first = str(first)
last = str(last)
first = first.upper
last = last,upper
print("HELLO %s %s" % (first, last))
name('bob', 'robert') # Will print "HELLO bob robert"
def name(first, last):
first = str(first)
last = str(last)
first = first.upper()
last = last.upper()
print("HELLO", first, last)
name("bob","robert")
1.str-objects has upper-method, but to call it and get result u have to add "()" after the name of method - because you get link to object-method - not to string in upper case...
2.in calling name(bob,robert) - you put the arguments, which are undefined variables..
to do this u have to define these variables before calling, f.g:
bob = "bob"
robert="robert"
name(bob,robert)
You need to put the strings to quotes (either "bob" or 'bob').
So your call would be
name('bob', 'robert')
instead of
name(bob, robert)
.
If you use it without the quotes, python tries to find a variable with a name bob.
Also, you do not need to use the str(first) or str(last) since both are already strings.

pl/python TypeError: sequence item 21: expected string, int found

Friends: in PostgreSQL plpython, am trying to do an iterative search/replace in a text block 'data'.
Using re-sub to define a match pattern, then call a function 'replace' to do the work.
Objective is to have the 'replace' function called repeatedly, as some replacements generate further 'rule' matches, which require further replacements.
All works well through many, many replacements - and I'm managing to trigger the 2nd Pass of the repeat loop. Then, until something causes the Regex pattern to return an integer(?) -- apparently at the point it finds no matches... ?? I've tried testing for 'None' and '0', with no luck. Ideas?
data = (a_huge_block of_text)
# ====================== THE FUNCTION ==============
def replace(matchobj):
tag = matchobj.group(1)
plpy.info("-------- matchobj.group(1), tag: ", tag)
if matchobj.group(1) != '':
(do all the replacement work in here)
# ====================== END FUNCTION ==============
passnumber = 0
# If _any_ pattern match is found, process all of data for _all_ matches:
while re.search('(rule:[A-Za-z#]+)', data) != '':
# BEGIN repeat loop:
passnumber = passnumber + 1
plpy.info(' ================================ BEGIN PASS: ', passnumber)
data = re.sub('(rule:[A-Za-z#]+)', replace, data)
plpy.info(' =================================== END PASS: ', passnumber)
Above code seems to be running OK, into a second iteration... then:
ERROR: TypeError: sequence item 21: expected string, int found
CONTEXT: Traceback (most recent call last):
PL/Python function "myfunction", line 201, in <module>
data = re.sub('(rule:[A-Za-z#]+)', replace, data)
PL/Python function "myfunction", line 150, in sub
PL/Python function "myfunction"
Have also tried re.search (...) != '' -- and re.search (...) != 'None' --- with same result.
I do realize I must find the syntax to represent the match object in some readable form...
The answer to this turned out to be quite simple, of course, once you know Python! (I don't!)
To initiate the repeat loop, I had been doing this test:
while re.search('(rule:[A-Za-z#]+)', data) != '':
Had also tried this one, which will also not work:
while re.search('(rule:[A-Za-z#]+)', data) != 'None':
The None result can be trapped, of course, but the quotes are not needed. It's as simple as that:
while re.search('(rule:[A-Za-z#]+)', data) != None:
It's all so simple, once you know!

Python: needs more than 1 value to unpack

What am I doing wrong to get this error?
replacements = {}
replacements["**"] = ("<strong>", "</strong>")
replacements["__"] = ("<em>", "</em>")
replacements["--"] = ("<blink>", "</blink>")
replacements["=="] = ("<marquee>", "</marquee>")
replacements["##"] = ("<code>", "</code>")
for delimiter, (open_tag, close_tag) in replacements: # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag);
The error:
Traceback (most recent call last):
File "", line 1, in
for doot, (a, b) in replacements: ValueError: need more than 1 value to
unpack
All the values tuples have two values. Right?
It should be:
for delimiter, (open_tag, close_tag) in replacements.iteritems(): # or .items() in py3k
I think you need to call .items() like the third example in this link
for delimiter, (open_tag, close_tag) in replacements.items(): # error here
message = self.replaceFormatting(delimiter, message, open_tag, close_tag)

Categories