I am always plagued with problems that involve checking values for indices i and i+1 within a for loop. However, doing so causes IndexError. One solution is to use range-1 but often that fails to check the last index value.
For example, given the following problem:
Write a function that compares two DNA sequences based on the
following scoring scheme: +1 for a match, +3 for each consecutive
match and -1 for each mismatch.
I wrote the solution in the following way:
def pairwiseScore(seqA, seqB):
if len(seqA) != len(seqB):
return 'Length mismatch'
count = 0
for i in range(len(seqA)-1):
if seqA[i] == seqB[i] and seqA[i+1] == seqB[i+1]:
count += 3
elif seqA[i] == seqB[i]:
count += 1
else:
count -= 1
#print count
return "Score: {c:d}".format(c=count)
print pairwiseScore("ATTCGT", "ATCTAT")
When I run this, I get a score of 1. This is because the program's missing the last index. I can see this if I print the values:
A A
T T
T C
C T
G A
Score: 1
[Finished in 0.1s]
It should return a score of 2.
Another string to check:
pairwiseScore("GATAAATCTGGTCT", "CATTCATCATGCAA")
This should return a score of 4
How do I resolve such types of problems?
You need something like that
def pairwiseScore(seqA, seqB):
a=len(seqA)
if a != len(seqB):
return 'Length mismatch'
count = 0
for i in range(0,a):
if seqA[i] == seqB[i] and i+1<a and seqA[i+1] == seqB[i+1]:
count += 3
elif seqA[i] == seqB[i]:
count += 1
else:
count -= 1
return "Score: {c:d}".format(c=count)
print pairwiseScore("GATAAATCTGGTCT", "CATTCATCATGCAA")
Explanation:
Assuming the lengths are equal. The first element in a list starts with zero. That's why I'm using range(0,a). Then, if i+1<a equals to True, that means there is an element after seqA[i], so one can use seqB[i+1] as the lengths are equal.
Moreover, range(0,a) 'counts' from zero to a-1, where a=len( seqA ). In Python 2.7.x range returns a list that certainly consumes memory. If len(seqA) might be a very big number, I'd suggest to use xrange instead. In Python 3.x you don't need such a precaution.
This is happening because you're only checking the first len(seq)-1 digits. If you want to check all of the digits, you need to for loop through the entire range(len(seq)). To avoid getting an IndexError, place a check at the beginning of the for loop to determine whether you're at the last position. If you are, don't make the consecutive sequence check.
Related
I was given a prompt to solve and was able to write code that passed, but my question is, is there a more simplified way I could write this without having to create a new named variable (s_index = 0)? The code works just fine but I'm not sure if I solved it the way I was expected to and am open to suggestions for improvement :)
Please note that this section in the work book has us focusing on using continue and break within loops
"Simon Says" is a memory game where "Simon" outputs a sequence of 10 characters (R, G, B, Y)
and the user must repeat the sequence. Create a for loop that compares the two strings.
For each match, add one point to user_score. Upon a mismatch, end the game.
Sample output with inputs: 'RRGBRYYBGY' 'RRGBBRYBGY'
User score: 4
user_score = 0
simon_pattern = input()
user_pattern = input()
s_index = 0
for letter in user_pattern:
if letter == simon_pattern[s_index]:
user_score += 1
s_index += 1
else:
break
print('User score:', user_score)
using functions to encapsulate small specific parts of your logic is often helpful
def do_score(user_pattern="1234",simon_pattern="1235"):
# using enumerate you can get the indices
for s_index,a_char in enumerate(simon_pattern):
if s_index >= len(user_pattern) or a_char != user_pattern[s_index]:
# the index should always match the "sum" so no need to track or compute the sum
return s_index
return len(simon_pattern)
this method takes 2 strings and "scores" them based on the "simon_pattern" returning the score
then just
print(do_score(user_entered_input,simon_pattern))
I will rewrite this to this way: (this way you can completely eliminate the variable index, and simon_pattern[index] to get the letter)
Note - in Python a word is just a sequence of character/letters, you can iterate it directly, no need to use index.
simon = 'RRGBRYYBGY'
user = 'RRGBBRYBGY' # User score: 4
user_score = 0
for user_char, simon_char in zip(user, simon):
if user_char == simon_char: # continue to check/and match...
user_score += 1
else: # break, if no match!
break
print('User score:', user_score)
Strictly, you never need to know the index or index into the strings, you can just use zip() to combine tuples of respective characters from the two (possibly different-length) strings:
def do_score(user_pattern='RRGBRYYBGY', simon_pattern='RRGBBRYBGY'):
score = 0
for uc,sc in zip(user_pattern, simon_pattern):
if uc == sc:
score += 1
else:
break
return score
assert do_score('', '') == 0
assert do_score('RRG', '') == 0
assert do_score('', 'RRG') == 0
assert do_score('RRG', 'RRGB') == 3
Return the number of times that the string "code" appears anywhere in the given string, except we'll accept any letter for the 'd', so "cope" and "cooe" count.
count_code('aaacodebbb') → 1
count_code('codexxcode') → 2
count_code('cozexxcope') → 2
My Code
def count_code(str):
count=0
for n in range(len(str)):
if str[n:n+2]=='co' and str[n+3]=='e':
count+=1
return count
I know the right code (just adding len(str)-3 at line 3 will work) but I'm not able to understand why str[n:n+2] works without '-3' and str[n+3]
Can someone clear my doubt regarding this ?
Say our str was "abcde".
If you didn't have the -3 in the len(str), then we would have an index of n going from 0,1,2,3,4.
str[n+3] with n being 4 would ask python to find the 7th letter of "abcde", and voila, an indexerror.
It is because the for loop will loop through all the string text so that when n is representing the last word. n+1 and n+2does not exist. Which it will tells you that the string index is out of range.
For example: 'aaacodebbb' the index of the last word is 9. So that when the for loop goes to the last word, n=9. But n+1=10 and n+2=11 index does not exist in your word. So that index 10 and 11 is out of range
loop for is an easy way to do that.
def count_code(str):
count = 0
for i in range(len(str)-3):
# -3 is because when i reach the last word, i+1 and i+2
# will be not existing. that give you out of range error.
if str[i:i+2] == 'co' and str[i+3] == 'e':
count +=1
return count
From a string input of space-separated numbers such as 1 2 3 10 20, the numbers are to be stored in a list.
n=int(input())
x=input()
for i in range(len(x)-1):
if x[i]!=" " and x[i+1]!=" ":
k=x[i]*10+x[i+1]
z.append(k)
i=i+1
continue
elif x[i]!=" " and x[i+1]==" ":
z.append(x[i])
else:
continue
for i in range(n):
print(z[i])
It's showing that the output is:
1
2
3
4
11111111110
0
Why aren't the integers in the string performing the correct arithmetic operations when getting appended to the list?
x[i] is a string, not a number. When you multiply a string, it makes copies of the string. So when x[i] == '1' and x[i+1] == '0', k=x[i]*10+x[i+1] sets k to '11111111110'.
You need to do k = int(x[i])*10 + int(x[i+1]).
Another problem is that you can't skip over an element by doing i = i + 1. You're iterating over a range, so the range() generator will always assign the next integer in the range, regardless of what i is currently set to. If you want to be able to skip elements, you should use a while loop:
i = 0
while i < len(x)-1:
...
i += 1
Note that your code only works for numbers with 1 or 2 digits. You should use a more general method of finding the boundaries between numbers to handle any number of digits.
Barmar have explained the issues with your code
you can also achieve the desired result using
n=int(input())
x=input().split(' ')
print(list(map(int, x)))
Im writing a program to try to calculate how many times the most repeated word in a list occurs. I keep getting an error that says: index error. Even though when I print the list of my word_list, it shows there are 108 elements. Could someone point me in the right direction as to where my error is?
length = len(word_list)
num = 0
print(length)
while num <= length:
ele = word_list[num]
if ele in wordDict:
wordDict[ele] = wordDict[ele] +1
repeat = repeat + 1
if repeat > highestRepeat:
highestRepeat = repeat
else:
wordDict[ele] = 1
repeat = 1
num = num+1
List indexing goes from 0 to length-1.
In your while loop, you've told the num to go from 0 to length. That's why you have an index error.
Simply change num <= length to num < length. That should fix your code for you.
As an aside, there are much better ways to do this particular task. A simple two liner:
from collections import Counter
print(Counter(word_list).most_common(1))
Counter will calculate the frequencies of each element in your list for you, and most_common(1) will return the element with the highest frequency in your list.
Just to mention that there is a more compact solution to your problem:
word_list =['this' ,'is', 'a', 'test', 'is']
for word in set(word_list):
print word, ": ", word_list.count(word)
Write a Python script that asks the user to enter two DNA
sequences with the same length. If the two sequences have
different lengths then output "Invalid input, the length must
be the same!" If inputs are valid, then compute how many dna bases at the
same position are equal in these two sequences and output the
answer "x positions of these two sequences have the same
character". x is the actual number, depending on the user's
input.
Below is what I have so far.
g=input('Enter DNA Sequence: ')
h=input('Enter Second DNA Sequence: ')
i=0
count=0
if len(g)!=len(h):
print('Invalid')
else:
while i<=len(g):
if g[i]==h[i]:
count+=1
i+=1
print(count)
Do this in your while loop instead (choose better variable names in your actual code):
for i, j in zip(g, h):
if i == j:
count += 1
OR replace the loop entirely with
count = sum(1 for i, j in zip(g, h) if i == j)
This will fix your index error. In general, you shouldn't be indexing lists in python, but looping over them. If you really want to index them, the i <= len(g) was the problem... it should be changed to i < len(g).
If you wanted to be really tricky, you could use the fact that True == 1 and False == 0:
count = sum(int(i == j) for i, j in zip(g, h))
The issue here is your loop condition. Your code gives you an IndexError; this means that you tried to access a character of a string, but there is no character at that index. What it means here is that i is greater than the len(g) - 1.
Consider this code:
while i<=len(g):
print(i)
i+=1
For g = "abc", it prints
0
1
2
3
Those are four numbers, not three! Since you start from 0, you must omit the last number, 3. You can adjust your condition as such:
while i < len(g):
# do things
But in Python, you should avoid using while loops when a for-loop will do. Here, you can use a for-loop to iterate through a sequence, and zip to combine two sequences into one.
for i, j in zip(g, h):
# i is the character of g, and j is the character of h
if i != j:
count += 1
You'll notice that you avoid the possibility of index errors and don't have to type so many [i]s.
i<=len(g) - replace this with i<len(g), because index counting starts from 0, not 1. This is the error you are facing. But in addition, your code is not very pretty...
First way to simplify it, keeping your structure:
for i in range(len(g)):
if g[i]==h[i]:
count+=1
Even better, you can actually make it a one-liner:
sum(g[i]==h[i] for i in range(len(g)))
Here the fact that True is evaluated to 1 in Python is used.
g = raw_input('Enter DNA Sequence: ')
h = raw_input('Enter Second DNA Sequence: ')
c = 0
count = 0
if len(g) != len(h):
print('Invalid')
else:
for i in g:
if g[c] != h[c]:
print "string does not match at : " + str(c)
count = count + 1
c = c + 1
print(count)
if(len(g)==len(h)):
print sum([1 for a,b in zip(g,h) if a==b])
Edit: Fixed the unclosed parens. Thanks for the comments, will definitely look at the generator solution and learn a bit - thanks!