Python "list index out of range" error - python

My goals is to have a list of lists, where each item in the outer list contains a word in it's first index, and the number of times it has come across it in the second index. As an example, it should look like this:
[["test1",0],["test2",4],["test3",8]]
The only issue is that when I try to, for instance, access the word "test1" from the first inner-list, I get an index out of range error. Here is my code for how I am attempting to do this:
stemmedList = [[]]
f = open(a_document_name, 'r')
#read each line of file
fileLines = f.readlines()
for fileLine in fileLines:
#here we end up with stopList, a list of words
thisReview = Hw1.read_line(fileLine)['text']
tokenList = Hw1.tokenize(thisReview)
stopList = Hw1.stopword(tokenList)
#for each word in stoplist, compare to all terms in return list to
#see if it exists, if it does add one to its second parameter, else
#add it to the list as ["word", 0]
for word in stopList:
#if list not empty
if not len(unStemmedList) == 1: #for some reason I have to do this to see if list is empty, I'm assuming when it's empty it returns a length of 1 since I'm initializing it as a list of lists??
print "List not empty."
for innerList in unStemmedList:
if innerList[0] == word:
print "Adding 1 to [" + word + ", " + str(innerList[1]) + "]"
innerList[1] = (innerList[1] + 1)
else:
print "Adding [" + word + ", 0]"
unStemmedList.append([word, 0])
else:
print "List empty."
unStemmedList.append([word, 0])
print unStemmedList[len(unStemmedList)-1]
return stemmedList
The final output ends up being:
List is empty.
["test1",0]
List not empty"
Crash with list index out of range error which points to the line if innerList[0] == word

You have a = [[]]
Now, when you are appending to this list after encountering first word, you have
a = [ [], ['test', 0] ]
In the next iteration you are accessing the 0th element of an empty list which doesn't exist.

Assuming that stemmedList and unStemmedList are similar
stemmedList = [[]]
you have an empty list in your list of lists, it has no [0]. Instead just initialize it to:
stemmedList = []

Isn't this simpler?
counts = dict()
def plus1(key):
if key in counts:
counts[key] += 1
else:
counts[key] = 1
stoplist = "t1 t2 t1 t3 t1 t1 t2".split()
for word in stoplist:
plus1(word)
counts
{'t2': 2, 't3': 1, 't1': 4}

Related

How can I group a sorted list into tuples of start and endpoints of consecutive elements of this list?

Suppose that my sorted list is as such:
L = ["01-string","02-string","03-string","05-string","07-string","08-string"]
As you can see this list has been sorted. I now want the start and end points of each block of continuous strings in this list, for example, the output for this should be:
L_continuous = [("01-string", "03-string"),("05-string","05-string"),("07-string","08-string")]
So, just to clarify, I need a list of tuples and in each of these tuples I need the start and endpoint of each consecutive block in my list. So, for example, elements 0, 1 and 2 in my list are consecutive because 01,02,03 are consecutive numbers - so the start and endpoints would be "01-string" and "03-string".
The numbers 1-3 are consecutive so they form a block, whereas 5 does not have any consecutive numbers in the list so it forms a block by itself.
Not a one-liner, but something like this might work:
L = ["01-string","02-string","03-string","05-string","07-string","08-string"]
counter = None
# lastNum = None
firstString = ""
lastString = ""
L_continuous = list()
for item in L:
currentNum = int(item[0:2])
if counter is None:
# startTuple
firstString = item
counter = currentNum
lastString = item
continue
if counter + 1 == currentNum:
# continuation of block
lastString = item
counter += 1
continue
if currentNum > counter + 1:
# end of block
L_continuous.append((firstString,lastString))
firstString = item
counter = currentNum
lastString = item
continue
else:
print ('error - not sorted or unique numbers')
# add last block
L_continuous.append((firstString,lastString))
print(L_continuous)
The first thing to do is extract an int from the string data, so that we can compare consecutive numbers:
def extract_int(s):
return int(s.split('-')[0])
Then a straightforward solution is to keep track of the last number seen, and emit a new block when it is not consecutive with the previous one. At the end of the loop, we need to emit a block of whatever is "left over".
def group_by_blocks(strs):
blocks = []
last_s = first_s = strs[0]
last_i = extract_int(last_s)
for s in strs[1:]:
i = extract_int(s)
if i != last_i + 1:
blocks.append( (first_s, last_s) )
first_i, first_s = i, s
last_i, last_s = i, s
blocks.append( (first_s, last_s) )
return blocks
Example:
>>> group_by_blocks(L)
[('01-string', '03-string'), ('05-string', '05-string'), ('07-string', '08-string')]

How to print all element of the list and not only the last one?

I wrote a script that allows me to extract through a loop all the floating numbers and put them in a list, then display this list with all the extracted floating numbers except that in my script only the last number of each list is taken into account, while I would like all the numbers to be displayed. how to do it?
there is my code :
final_result = []
result = []
k = listFps
k = 0
while k < len(listFps):
with open(listFps[k], 'r') as f:
#
statList = f.readlines()
statList = [x.strip() for x in statList]
for line in statList:
if (re.search("=", str(line))):
if (re.search('#IND', str(line))):
print("ok")
else:
result =re.findall("=\s*?(\d+\.\d+|\d+)", str(line))
print (" ca c result " ,result)
numberList = [float(q) for q in result]
print("ca c number list :",numberList)
k+=1
its print me only the last element of my list like this :
[59.889]
[60.874]
etc..
But i actually want a list with all element :
[59.889,60.874....]
Help me please im stuck with it for too long..
Instead of
result = re.findall….
use
result += re.findall…

python list within list initialization and printing

I am trying to append an element to a list within a list that has an incremented value each time:
def get_data(file):
matrix = [ ['one','two','three'] ] #list of lists
test_count = 0
line_count = 0 #keep track of which line we are on
for line in file:
if line.find('example') != -1: #test for example string
temp_a = re.findall(r"\'(.+?)\'",line)[0]
print matrix[test_count][0] #should print 'one'
matrix[test_count][0].insert(temp_a) #should insert temp_a instead of 'one'
test_count += 1 #go to next "new" list in the matrix
line_count += 1 #go to next line
What I want is the result of findall to go into temp_a and from there to insert it into index 0 of the first list within a list. Then the next time findall is true, I want to insert temp_a to index 0 of the second list.
For example if the first temp_a value is 9, I would like the first list in the matrix to be:
[ [9,y,z] ]
If on the second findall my temp_a is 4, I want the matrix to become:
[ [9,y,z], [4,y,z] ]
The above code is my best attempt so far.
I have 2 questions:
1) How can I initialize a 'list of lists' if the amount of lists isn't fixed?
2) The list ['one','two','three'] was to test with printing what is going on. If I try to print out matrix[test_count][0], I get an "index out of range" error, but the moment I change it to print out matrix[0][0] it prints 'one' correctly. Is there something with the scope that I'm missing here?
To answer your questions:
1) Like this: matrix = []
Simply put, this just creates an empty list that you can append anything you want into, including more lists. So matrix.append([1,2,3]) gives you a list like this: [[1,2,3]]
2) So you're index out of range error is coming from the fact that you're incrementing test_count to 1 but your matrix is remaining length of 1 (meaning it only has the 0 index) since you never append anything. In order to get the output that you want you're going to need to make a few changes:
def get_data(file):
example_list = ['one','two','three']
matrix = [] #list of lists
test_count = 0
line_count = 0 #keep track of which line we are on
for line in file:
if line.find('example') != -1: #test for example string
temp_a = re.findall(r"\'(.+?)\'",line)[0]
new_list = example_list[:]
new_list[0] = temp_a
matrix.append(new_list)
test_count += 1 #go to next "new" list in the matrix
line_count += 1 #go to next line
print matrix #[['boxes', 'two', 'three'], ['equilateral', 'two', 'three'], ['sphere', 'two', 'three']]
For 2), did you try to print out test_count? Since your test_count+=1 is in if statement, it shouldn't be out of range without printing "one".
For 1), you could do this before insert:
if test_count == len(matrix):
matrix.append([])
It adds a new empty list if test_count of out range of matrix.
EDIT:
"Out of range" caused by line temp_a = re.findall(r"\'(.+?)\'",line)[0] because it can't find anything. So it's an empty list, and [0] out of range.
def get_data(file):
matrix = [ ['one','two','three'] ] #list of lists
test_count = 0
line_count = 0 #keep track of which line we are on
for line in file:
if line.find('example') != -1: #test for example string
temp_a = re.findall(r"\'(.+?)\'",line)
if temp_a:
temp_a = temp_a[0]
else:
continue # do something if not found
print(matrix[test_count][0]) #should print 'one'
new_list = matrix[test_count][:]
new_list[0] = temp_a
matrix[test_count].append(new_list) #should insert temp_a instead of 'one'
test_count += 1 #go to next "new" list in the matrix
line_count += 1 #go to next line

How I display 2 words before and after a key search word in Python

Very new to Python programming. How I display 2 words before and after a key search word. In below example I am looking for a search word = lists
Sample:
Line 1: List of the keyboard shortcuts for Word 2000
Line 2: Sequences: strings, lists, and tuples - PythonLearn
Desired results (Lists word only found only in line 2)
Line 2: Sequences: strings, lists, and tuples
Thanks for your help in this.
This solution is based on Avinash Raj's second example with these amendments:
Allows the number of words to be printed each side of the search word to be varied
Uses a list comprehension instead of if inside for, which may be considered more 'Pythonic', though I'm not sure in this case if it's more readable.
.
s = """List of the keyboard shortcuts for Word 2000
Sequences: strings, lists and tuples - PythonLearn"""
findword = 'lists'
numwords = 2
for i in s.split('\n'):
z = i.split(' ')
for x in [x for (x, y) in enumerate(z) if findword in y]:
print(' '.join(z[max(x-numwords,0):x+numwords+1]))
Through re.findall function.
>>> s = """List of the keyboard shortcuts for Word 2000
Sequences: strings, lists, and tuples - PythonLearn"""
>>> re.findall(r'\S+ \S+ \S*\blists\S* \S+ \S+', s)
['Sequences: strings, lists, and tuples']
Without regex.
>>> s = """List of the keyboard shortcuts for Word 2000
Sequences: strings, lists, and tuples - PythonLearn"""
>>> for i in s.split('\n'):
z = i.split()
for x,y in enumerate(z):
if 'lists' in y:
print(z[x-2]+' '+z[x-1]+' '+z[x]+' '+z[x+1]+' '+z[x+2])
Sequences: strings, lists, and tuples
This is the solution I can think of right away for your question :-)
def get_word_list(line, keyword, length, splitter):
word_list = line.split(keyword)
if len(word_list) == 1:
return []
search_result = []
temp_result = ""
index = 0
while index < len(word_list):
result = word_list[index].strip().split(splitter, length-1)[-1]
result += " " + keyword
if index+1 > len(word_list):
search_result.append(result.strip())
break
right_string = word_list[index+1].lstrip(" ").split(splitter, length+1)[:length]
print word_list[index+1].lstrip(), right_string
result += " " + " ".join(right_string)
search_result.append(result.strip())
index += 2
return search_result
def search(file, keyword, length=2, splitter= " "):
search_results = []
with open(file, "r") as fo:
for line in fo:
line = line.strip()
search_results += get_word_list(line, keyword, length, splitter)
for result in search_results:
print "Result:", result

find the index of element the number of occurence in string

A Char_Record is a 3 item list [char, total, pos_list] where
char is a one character string
total is a Nat representing the number of occurrences of char
pos_list is a list of Nat representing the indices of char
Using the function build_char_records() should produce a sorted list with every character represented (lowercase).
For example:
>>>build_char_records('Hi there bye')
['',2,[2,8]]
['b',1,[9]]
['e',3,[5,7,11]]
['h',2[0,4]]
['i',1,[1]]
['r',1,[6]]
['t',1,[3]]
['y',1,[10]]
I just wrote it like this , I don't know how to do it, someone help please. Thanks.
def build_char_records(s):
s=sorted(s)
a=[]
for i in range(len(s)):
I think that the other answers given thus far are better answers from an overall programming perspective, but based on your question I think this answer is appropriate for your skill level
def build_char_records(phrase):
phrase = phrase.lower()
resultList = []
for character in phrase: ## iterates through the phrase
if character not in resultList:
resultList.append(character) ## This adds each character to the list
## if it is not already in the list
resultList.sort() ## sorts the list
for i in range(len(resultList)): ## goes through each unique character
character = resultList[i] ## the character in question
tphrase = phrase ## a copy of the phrase
num = phrase.count(character) ## the number of occurences
acc = 0 ## an accumulator to keep track of how many we've found
locs = [] ## list of the locations
while acc < num: ## while the number we've found is less than how many
## there should be
index = tphrase.find(character) ## finds the first occurance of the character
tphrase = tphrase[index+1:] ## chops off everything up to and including the
## character
if len(locs) != 0: ## if there is more than one character
index = locs[acc-1] + index + 1 ## adjusts because we're cutting up the string
locs.append(index)## adds the index to the list
acc += 1 ## increases the accumulator
resultList[i] = [character, num, locs] ## creates the result in the proper spot
return resultList ## returns the list of lists
print build_char_records('Hi there bye')
This will print out [[' ', 2, [2, 8]], ['b', 1, [9]], ['e', 3, [5, 7, 11]], ['h', 2, [0, 4]], ['i', 1, [1]], ['r', 1, [6]], ['t', 1, [3]], ['y', 1, [10]]]
Here is a slightly shorter, cleaner version
def build_char_records(phrase):
phrase = phrase.lower()
resultList = []
for character in phrase:
if character not in resultList:
resultList.append(character)
resultList.sort()
for i in range(len(resultList)):
tphrase = phrase
num = phrase.count(resultList[i])
locs = []
for j in range(num):
index = tphrase.find(resultList[i])
tphrase = tphrase[index+1:]
if len(locs) != 0:
index = locs[acc-1] + index + 1
locs.append(index)
resultList[i] = [resultList[i], num, locs]
return resultList
print build_char_records('Hi there bye')
Using only list, this is what you can do:
def build_char_records(s):
records = [] # Create a list to act as a dictionary
for idx, char in enumerate(s):
char = char.lower()
current_record = None # Try to find the record in our list of records
for record in records: # Iterate over the records
if record[0] == char: # We find it!
current_record = record # This is the record for current char
break # Stop the search
if current_record is None: # This means the list does not contain the record for this char yet
current_record = [char, 0, []] # Create the record
records.append(current_record) # Put the record into the list of records
current_record[1] += 1 # Increase the count by one
current_record[2].append(idx) # Append the position of the character into the list
for value in records: # Iterate over the Char_Record
print value # Print the Char_Record
Or, if you need to sort it, you can do what #Dannnno said, or as an example, it can be sorted in this way (although you might have not learned about lambda):
records.sort(key=lambda x: x[0])
Just put that before printing the records.
Or, you can do it using dict and list:
def build_char_records(s):
record = {} # Create an empty dictionary
for idx, char in enumerate(s): # Iterate over the characters of string with the position specified
char = char.lower() # Convert the character into lowercase
if char not in record: # If we have not seen the character before, create the Char_Record
record[char] = [char,0,[]] # Create a Char_Record and put it in the dictionary
record[char][1] += 1 # Increase the count by one
record[char][2].append(idx) # Append the position of the character into the list
for value in record.values(): # Iterate over the Char_Record
print value # Print the Char_Record
from collections import defaultdict
def build_char_records(s):
cnt = defaultdict(int)
positions = defaultdict(list)
for i,c in enumerate(s):
cnt[c] += 1
positions[c].append(i)
return [ [c, cnt[c], positions[c]] for c in cnt.keys() ]

Categories