Programming error with string index out of range python - python

Every time I try to run my code and open a specific text document which is fairly long, it says string index out of range. I was wondering if any of you could help me solve this problem. I have to get my code to spit out more than what I have it doing right now, but this is a start and I didn't figure it wise to carry on without fixing the initial error.
file = open(input("Enter File Name:"))
lines = file.readlines()
file.close()
number_chars = 0
number_words = 0
largest_word = ""
smallest_word = ""
count_words = 0
largest = int(0)
smallest = int(999999)
average = 0
count = 0
sum = 0
for item in lines:
is_word = False
item = item.strip()
char = item[0]
count_words = count_words + 1
if char >= 'a' and char <= 'z':
is_word = True
elif char >= 'A' and char <= 'Z':
is_word = True
if is_word == True:
if(len(largest_word) < len(item)):
largest_word = item
if(len(smallest_word) > len(item)):
smallest_word = item
number_chars += len(item)
else:
item = int(item)
count = count + 1
sum = sum + item
if(count == 1 or item > largest):
largest = item
if(count == 1 or item < smallest):
smallest = item
print("Reading", "name_list.txt")
print("found",count,"numbers")
print("largest number :", largest)
print("smallest numer :", smallest)
Average = sum // count
print("Average :", Average)
print(largest_word)
print(smallest_word)

The first thing I'd be thinking about is the effect an empty line is going to have on the statements (primarily the last one):
file = open(input("Enter File Name:"))
lines = file.readlines()
for item in lines:
item = item.strip()
char = item[0]
An empty line will not have an index zero.
In terms of fixing it, it depends on your intent. From a casual glance, it appears that you treat each line as either a word or a number (though you may want to consider what happens if the line starts with :, for example - that won't be considered a word by your code and it's almost certainly going to cause problems when treating it as an integer).
However, putting that aside for now: since a blank line is neither a word nor a number, you may find the simplest fix is to simply ignore those blank lines totally, with something like:
item = item.strip() # This line already exists.
if item == "":
continue
A blank item will simply cycle around and get the next line.
However, as mentioned, this will only fix your immediate problem, you really should consider handling things that are not in the set {word, number}.

Related

CS50 DNA works for small.csv but not for large

I am having problems with CS50 pset6 DNA. It is getting all the right values and gives correct answers when I use the small.csv file but not when I use the large one. I have been going through it with debug50 for over a week and can't figure out the problem. I assume the problem is somewhere in the loop through the samples to find the STRS but I just don't see what it is doing wrong when walking through it.
If you are unfamiliar with CS50 DNA problemset, the code is supposed to look through a dna sequence (argv[1]) and compare it with a CSV file containing people DNA STRs to figure out which person (if any) it belongs to.
Note; My code fails within the case; (Python dna.py databases/large.csv sequences/5.txt) if this helps.
from sys import argv
from csv import reader
#ensures correct number of arguments
if (len(argv) != 3):
print("usage: python dna.py data sample")
#dict for storage
peps = {}
#storage for strands we look for.
types = []
#opens csv table
with open(argv[1],'r') as file:
data = reader(file)
line = 0
number = 0
for l in data:
if line == 0:
for col in l:
if col[2].islower() and col != 'name':
break
if col == 'name':
continue
else:
types.append(col)
line += 1
else:
row_mark = 0
for col in l:
if row_mark == 0:
peps[col] = []
row_mark += 1
else:
peps[l[0]].append(col)
#convert sample to string
samples = ""
with open(argv[2], 'r') as sample:
for c in sample:
samples = samples + c
#DNA STR GROUPS
dna = { "AGATC" : 0,
"AATG" : 0,
"TATC" : 0,
"TTTTTTCT" : 0,
"TCTAG" : 0,
"GATA" : 0,
"GAAA" : 0,
"TCTG" : 0 }
#go through all the strs in dna
for keys in dna:
#the longest run of sequnace
longest = 0
#the current run of sequances
run = 0
size = len(keys)
#look through sample for longest
i = 0
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
dna[keys] = longest
#see who it is
positive = True
person = ''
for key in peps:
positive = True
for entry in types:
x = types.index(entry)
test = dna.get(entry)
can = int(peps.get(key)[x])
if (test != can):
positive = False
if positive == True:
person = key
break
if person != '':
print(person)
else:
print("No match")
Problem is in this while loop. Look at this code carefully.
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
You have a missing logic here. You are supposed to check the longest consecutive DNA sequence. So when you have a repetition of dna sequence back to back, you need to find how many times it is repeated. When it is no longer repeated, only then, you need to check if this is the longest sequence.
Solution
You need to add else statement after if hold==keys: statement. This would be the right fix;
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
else: #only if there is no longer sequence match, check this.
if run > longest:
longest = run
run = 0
else: #if the number of sequence match is still smaller then longest, then make run zero.
run = 0
i += 1
earik87 is absolutely right! Just I like to add, the code is missing an = to work for all the cases especially when you have redundant sequences.
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples **( I added =)**
if ((i + size) <= len(samples)):
i = i + size
continue
else: #only if there is no longer sequence match, check this.
if run > longest:
longest = run
run = 0
else: #if the number of sequence match is still smaller then longest, then make run zero.
run = 0
i += 1

How to find the largest number of times a word is repeated consecutively in a given string?

Okay, so this is kind of a confusing question, I will try and word it in the best way that I can.
I'm trying to figure out a way that I can find the largest consecutive repeats of a word in a string in Python
For example, let's say the word I want to look for is "apple" and the string is: "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple". Here, the largest number of consecutive repeats for the word "apple" is 3.
I have tried numerous ways of finding repeating character such as this:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
But this works with integers and not words. It also outputs the number of times the character repeats in general but does not identify the largest consecutive repeats. I hope that makes sense. My brain is kind of all over the place rn so I apologize if im trippin
Here is a solution I came up with that I believe solves your problem. There is almost certainly a simpler/faster way to do it if you spend more time with the problem which I would encourage.
import re
search_string = "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple"
search_term = "apple"
def search_for_term(search_string, search_term):
#split string into array on search_term
#keeps search term in array unlike normal string split
split_string = re.split(f'({search_term})', search_string)
#remove unnecessary characters
split_string = list(filter(lambda x: x != "", split_string))
#enumerate string and filter out instances that aren't the search term
enum_string = list(filter(lambda x: x[1] == search_term, enumerate(split_string)))
#loop through each of the items in the enumerated list and save to the current chain
#once a chain brakes i.e. the next element is not in order append the current_chain to
#the chains list and start over
chains = []
current_chain = []
for idx, val in enum_string:
if len(current_chain) == 0:
current_chain.append(idx)
elif idx == current_chain[-1] + 1:
current_chain.append(idx)
else:
chains.append(current_chain)
current_chain = [idx]
print(chains, current_chain)
#append anything leftover in the current_chain list to the chains list
if len(current_chain) > 0:
chains.append(current_chain)
del current_chain
#find the max length nested list in the chains list and return it
max_length = max(map(len, chains))
return max_length
max_length = search_for_term(search_string, search_term)
print(max_length)
Here is how I would do this. first check for 'apple' in the randString, then check for 'appleapple', then 'appleappleapple', and so on until the search result is empty. Keep track of the iteration count and voilĂ .
randString = "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple"
find = input('type in word to search for: ')
def consecutive():
count =0
for i in range(len(randString)):
count +=1
seachword = [find*count]
check = [item for item in seachword if item in randString]
if len(check) != 0:
continue
else:
# Need to remove 1 from the final count.
print (find, ":", count -1)
break
consecutive()

CS50 (2020) PSET6 DNA IndexError: list index out of range

This has been a very difficult problem set for me, and I've had a tough time even getting something to work. At this time this is what my code looks like; I can't even tell if it outputs anything because I can't get it to run without an error.
import csv
from cs50 import get_string
from sys import argv
if len(argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
exit()
# Load Dictionary into the system
with open(argv[1], 'r') as file:
reader = csv.reader(file, delimiter=',')
dna_data = list(reader)
str_names = (dna_data[0][1:])
# Load Text DNA Sequence into system
text = open(argv[2], 'r')
sequence = text.read()
linecount = 0
# occurrences of each str in dna sequence
occurrences = []
for i in range(0,len(str_names)):
start = 0
substr_count = 0
consecutive = 0
while True:
string = sequence
substring = str(dna_data[i])
location = string.find(substring,start)
if location == -1:
break
if location != start:
consecutive = 1
else:
consecutive += 1
if consecutive > substr_count:
substr_count = consecutive
start = location + len(substring)
occurrences.append(substr_count)
substr_count = 0
repeats = []
match = 0
name = 0
for i in range (1, len(dna_data)):
for j in range (1, len(dna_data[i])):
repeats.append(dna_data[i][j])
for k in range(1, len(dna_data)):
if int(repeats[k]) == int(occurrences[k]): # THIS IS WHERE THE ERROR IS
match += 1
nomatch = False
else:
nomatch = True
if match == len(occurrences) and nomatch == False:
print(dna_data[i][0])
name += 1
else:
nomatch = True
match = 0
if nomatch == True and name == 0:
print("No Match")
I am sure the answer is probably something simple, but I just can't seem to figure it out. I am pretty sure the list length of both "occurrences" and "repeats" are, in fact, both "len(dna_data)," but the error seems to state that k is increasing past the length of one of the lists. Is there anything here that is obviously causing a problem? How could I go about comparing the number present at position k within the repeats list and occurrences list? Thanks.

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()
This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

Python Convert String Literal to Float

I am working through the book "Introduction to Computation and Programming Using Python" by Dr. Guttag. I am working on the finger exercises for Chapter 3. I am stuck. It is section 3.2, page 25. The exercise is: Let s be a string that contains a sequence of decimal numbers separated by commas, e.g., s = '1.23,2.4,3.123'. Write a program that prints the sume of the numbers in s.
The previous example was:
total = 0
for c in '123456789':
total += int(c)
print total.
I've tried and tried but keep getting various errors. Here's my latest attempt.
total = 0
s = '1.23,2.4,3.123'
print s
float(s)
for c in s:
total += c
print c
print total
print 'The total should be ', 1.23+2.4+3.123
I get ValueError: invalid literal for float(): 1.23,2.4,3.123.
Floating point values cannot have a comma. You are passing 1.23,2.4,3.123 as it is to float function, which is not valid. First split the string based on comma,
s = "1.23,2.4,3.123"
print s.split(",") # ['1.23', '2.4', '3.123']
Then convert each and and every element of that list to float and add them together to get the result. To feel the power of Python, this particular problem can be solved in the following ways.
You can find the total, like this
s = "1.23,2.4,3.123"
total = sum(map(float, s.split(",")))
If the number of elements is going to be too large, you can use a generator expression, like this
total = sum(float(item) for item in s.split(","))
All these versions will produce the same result as
total, s = 0, "1.23,2.4,3.123"
for current_number in s.split(","):
total += float(current_number)
Since you are starting with Python, you could try this simple approach:
Use the split(c) function, where c is a delimiter. With this you will have a list numbers (in the code below). Then you can iterate over each element of that list, casting each number to a float (because elements of numbers are strings) and sum them:
numbers = s.split(',')
sum = 0
for e in numbers:
sum += float(e)
print sum
Output:
6.753
From the book Introduction to Computation and Programming using Python at page 25.
"Let s be a string that contains a sequence of decimal numbers separated by commas, e.g., s
= '1.23,2.4,3.123'. Write a program that prints the sum of the numbers in s."
If we use only what has been taught so far, then this code is one approach:
tmp = ''
num = 0
print('Enter a string of decimal numbers separated by comma:')
s = input('Enter the string: ')
for ch in s:
if ch != ',':
tmp = tmp + ch
elif ch == ',':
num = num + float(tmp)
tmp = ''
# Also include last float number in sum and show result
print('The sum of all numbers is:', num + float(tmp))
total = 0
s = '1.23,2.4,3.123'
for c in s.split(','):
total = total + float(c)
print(total)
Works Like A Charm
Only used what i have learned yet
s = raw_input('Enter a string that contains a sequence of decimal ' +
'numbers separated by commas, e.g. 1.23,2.4,3.123: ')
s = "," + s+ ","
total =0
for i in range(0,len(s)):
if s[i] == ",":
for j in range(1,(len(s)-i)):
if s[i+j] == ","
total = total + float(s[(i+1):(i+j)])
break
print total
This is what I came up with:
s = raw_input('Enter a sequence of decimal numbers separated by commas: ')
aux = ''
total = 0
for c in s:
aux = aux + c
if c == ',':
total = total + float(aux[0:len(aux)-1])
aux = ''
total = total + float(aux) ##Uses last value stored in aux
print 'The sum of the numbers entered is ', total
I think they've revised this textbook since this question was asked (and some of the other's have answered.) I have the second edition of the text and the split example is not on page 25. There's nothing prior to this lesson that shows you how to use split.
I wound up finding a different way of doing it using regular expressions. Here's my code:
# Intro to Python
# Chapter 3.2
# Finger Exercises
# Write a program that totals a sequence of decimal numbers
import re
total = 0 # initialize the running total
for s in re.findall(r'\d+\.\d+','1.23, 2.2, 5.4, 11.32, 18.1,22.1,19.0'):
total = total + float(s)
print(total)
I've never considered myself dense when it comes to learning new things, but I'm having a hard time with (most of) the finger exercises in this book so far.
s = input('Enter a sequence of decimal numbers separated by commas: ')
x = ''
sum = 0.0
for c in s:
if c != ',':
x = x + c
else:
sum = sum + float(x)
x = ''
sum = sum + float(x)
print(sum)
This is using just the ideas already covered in the book at this point. Basically it goes through each character in the original string, s, using string addition to add each one to the next to build a new string, x, until it encounters a comma, at which point it changes what it has as x to a float and adds it to the sum variable, which started at zero. It then resets x back to an empty string and repeats until all the characters in s have been covered
Here's a solution without using split:
s='1.23,2.4,3.123,5.45343'
pos=[0]
total=0
for i in range(0,len(s)):
if s[i]==',':
pos.append(len(s[0:i]))
pos.append(len(s))
for j in range(len(pos)-1):
if j==0:
num=float(s[pos[j]:pos[j+1]])
total=total+num
else:
num=float(s[pos[j]+1:pos[j+1]])
total=total+num
print total
My way works:
s = '1.23, 211.3'
total = 0
for x in s:
for i in x:
if i != ',' and i != ' ' and i != '.':
total = total + int(i)
print total
My answer is here:
s = '1.23,2.4,3.123'
sum = 0
is_int_part = True
n = 0
for c in s:
if c == '.':
is_int_part = False
elif c == ',':
if is_int_part == True:
total += sum
else:
total += sum/10.0**n
sum = 0
is_int_part = True
n = 0
else:
sum *= 10
sum += int(c)
if is_int_part == False:
n += 1
if is_int_part == True:
total += sum
else:
total += sum/10.0**n
print total
I have managed to answer the question with the knowledge gained up until 3.2 the section for loop
s = '1.0, 1.1, 1.2'
print 'List of decimal number'
print s
total = 0.0
for c in s:
if c == ',':
total += float(s[0:(s.index(','))])
d = int(s.index(','))+1
s = s[(d+1) : len(s)]
s = float(s)
total += s
print '1.0 + 1.1 + 1.2 = ', total
This is the answer to the question i feel that the split function is not good for beginner like you and me.
Considering the fact that you might not yet be exposed to more complex functions, simply try these out.
total = 0
for c in "1.23","2.4",3.123":
total += float(c)
print total
My answer:
s = '2.1,2.0'
countI = 0
countF = 0
totalS = 0
for num in s:
if num == ',' or (countF + 1 == len(s)):
totalS += float(s[countI:countF])
if countF < len(s):
countI = countF + 1
countF += 1
print(totalS) # 4.1
This only works if the numbers are floats
Here is my answer. It is similar to the one by user5716300 above, but since I am also a beginner I explicitly created a separate variable s1 for the split string:
s = "1.23,2.4,3.123"
s1 = s.split(",") #this creates a list of strings
count = 0.0
for i in s1:
count = count + float(i)
print(count)
If we are just sticking with the content for that chapter, I came up with this: (though using that sum method mentioned by theFourthEye is also pretty slick):
s = '1.23,3.4,4.5'
result = s.split(',')
result = list(map(float, result))
n = 0
add = 0
for a in result:
add = add + result[n]
n = n + 1
print(add)
I just wanna to post my answer because I am reading this book now.
s = '1.23,2.4,3.123'
ans = 0.0
i = 0
j = 0
for c in s:
if c == ',':
ans += float(s[i:j])
i = j + 1
j += 1
ans += float(s[i:j])
print(str(ans))
Using knowledge from the book:
s = '4.58,2.399,3.1456,7.655,9.343'
total = 0
index = 0
for string in s:
index += 1
if string == ',':
temp = float(s[:index-1])
s = s[index:]
index = 0
total += temp
temp = 0
print(total)
Here I used string slicing, and by slicing the original string every time our 'string' variable is equal to ','. Also using an index variable to keep track of the number that is before the comma. After slicing the string, the number that gets input into tmp is cleared with the comma in front of it, the string becoming another string without that number.
Because of this, the index variable needs to be reset every time this happens.
Here's mine using the exact string in the question and only what has been taught so far.
total = 0
temp_num = ''
for char in '1.23,2.4,3.123':
if char == ',':
total += float(temp_num)
temp_num = ''
else:
temp_num += char
total += float(temp_num) #to catch the last number that has no comma after it
print(total)
I know this isn't covered in the book up to this point but I happened to learn the use of the eval() function on my own prior to getting to this question and used it to solve.
total = 0
s = "1.23,2.4,3.123"
x = eval(s)
y = sum(x)
print(y)
I think this is the easiest way to answer the question. It uses the split command, which is not introduced in the book at this moment but a very useful command.
s = input('Insert string of decimals, e,g, 1.4,5.55,12.651:')
sList = s.split(',') #create a list of these values
print(sList) #to check if list is correctly created
total = 0 #for creating the variable
for each in sList:
total = total + float(each)
print(total)
total =0
s = {1.23,2.4,3.123}
for c in s:
total = total+float(c)
print(total)

Categories