Im writing a program to try to calculate how many times the most repeated word in a list occurs. I keep getting an error that says: index error. Even though when I print the list of my word_list, it shows there are 108 elements. Could someone point me in the right direction as to where my error is?
length = len(word_list)
num = 0
print(length)
while num <= length:
ele = word_list[num]
if ele in wordDict:
wordDict[ele] = wordDict[ele] +1
repeat = repeat + 1
if repeat > highestRepeat:
highestRepeat = repeat
else:
wordDict[ele] = 1
repeat = 1
num = num+1
List indexing goes from 0 to length-1.
In your while loop, you've told the num to go from 0 to length. That's why you have an index error.
Simply change num <= length to num < length. That should fix your code for you.
As an aside, there are much better ways to do this particular task. A simple two liner:
from collections import Counter
print(Counter(word_list).most_common(1))
Counter will calculate the frequencies of each element in your list for you, and most_common(1) will return the element with the highest frequency in your list.
Just to mention that there is a more compact solution to your problem:
word_list =['this' ,'is', 'a', 'test', 'is']
for word in set(word_list):
print word, ": ", word_list.count(word)
Related
Okay, so this is kind of a confusing question, I will try and word it in the best way that I can.
I'm trying to figure out a way that I can find the largest consecutive repeats of a word in a string in Python
For example, let's say the word I want to look for is "apple" and the string is: "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple". Here, the largest number of consecutive repeats for the word "apple" is 3.
I have tried numerous ways of finding repeating character such as this:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
But this works with integers and not words. It also outputs the number of times the character repeats in general but does not identify the largest consecutive repeats. I hope that makes sense. My brain is kind of all over the place rn so I apologize if im trippin
Here is a solution I came up with that I believe solves your problem. There is almost certainly a simpler/faster way to do it if you spend more time with the problem which I would encourage.
import re
search_string = "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple"
search_term = "apple"
def search_for_term(search_string, search_term):
#split string into array on search_term
#keeps search term in array unlike normal string split
split_string = re.split(f'({search_term})', search_string)
#remove unnecessary characters
split_string = list(filter(lambda x: x != "", split_string))
#enumerate string and filter out instances that aren't the search term
enum_string = list(filter(lambda x: x[1] == search_term, enumerate(split_string)))
#loop through each of the items in the enumerated list and save to the current chain
#once a chain brakes i.e. the next element is not in order append the current_chain to
#the chains list and start over
chains = []
current_chain = []
for idx, val in enum_string:
if len(current_chain) == 0:
current_chain.append(idx)
elif idx == current_chain[-1] + 1:
current_chain.append(idx)
else:
chains.append(current_chain)
current_chain = [idx]
print(chains, current_chain)
#append anything leftover in the current_chain list to the chains list
if len(current_chain) > 0:
chains.append(current_chain)
del current_chain
#find the max length nested list in the chains list and return it
max_length = max(map(len, chains))
return max_length
max_length = search_for_term(search_string, search_term)
print(max_length)
Here is how I would do this. first check for 'apple' in the randString, then check for 'appleapple', then 'appleappleapple', and so on until the search result is empty. Keep track of the iteration count and voilà.
randString = "applebananaorangeorangeorangebananaappleappleorangeappleappleappleapple"
find = input('type in word to search for: ')
def consecutive():
count =0
for i in range(len(randString)):
count +=1
seachword = [find*count]
check = [item for item in seachword if item in randString]
if len(check) != 0:
continue
else:
# Need to remove 1 from the final count.
print (find, ":", count -1)
break
consecutive()
I am always plagued with problems that involve checking values for indices i and i+1 within a for loop. However, doing so causes IndexError. One solution is to use range-1 but often that fails to check the last index value.
For example, given the following problem:
Write a function that compares two DNA sequences based on the
following scoring scheme: +1 for a match, +3 for each consecutive
match and -1 for each mismatch.
I wrote the solution in the following way:
def pairwiseScore(seqA, seqB):
if len(seqA) != len(seqB):
return 'Length mismatch'
count = 0
for i in range(len(seqA)-1):
if seqA[i] == seqB[i] and seqA[i+1] == seqB[i+1]:
count += 3
elif seqA[i] == seqB[i]:
count += 1
else:
count -= 1
#print count
return "Score: {c:d}".format(c=count)
print pairwiseScore("ATTCGT", "ATCTAT")
When I run this, I get a score of 1. This is because the program's missing the last index. I can see this if I print the values:
A A
T T
T C
C T
G A
Score: 1
[Finished in 0.1s]
It should return a score of 2.
Another string to check:
pairwiseScore("GATAAATCTGGTCT", "CATTCATCATGCAA")
This should return a score of 4
How do I resolve such types of problems?
You need something like that
def pairwiseScore(seqA, seqB):
a=len(seqA)
if a != len(seqB):
return 'Length mismatch'
count = 0
for i in range(0,a):
if seqA[i] == seqB[i] and i+1<a and seqA[i+1] == seqB[i+1]:
count += 3
elif seqA[i] == seqB[i]:
count += 1
else:
count -= 1
return "Score: {c:d}".format(c=count)
print pairwiseScore("GATAAATCTGGTCT", "CATTCATCATGCAA")
Explanation:
Assuming the lengths are equal. The first element in a list starts with zero. That's why I'm using range(0,a). Then, if i+1<a equals to True, that means there is an element after seqA[i], so one can use seqB[i+1] as the lengths are equal.
Moreover, range(0,a) 'counts' from zero to a-1, where a=len( seqA ). In Python 2.7.x range returns a list that certainly consumes memory. If len(seqA) might be a very big number, I'd suggest to use xrange instead. In Python 3.x you don't need such a precaution.
This is happening because you're only checking the first len(seq)-1 digits. If you want to check all of the digits, you need to for loop through the entire range(len(seq)). To avoid getting an IndexError, place a check at the beginning of the for loop to determine whether you're at the last position. If you are, don't make the consecutive sequence check.
I am writing simple python code:
Question:
Given a list of strings, return the count of the number of
# strings where the string length is 2 or more and the first
# and last chars of the string are the same.
Solution I worked:
def match_ends(words):
for items in words:
count = 0
los = len(items)
first_char= items[0]
last_char= items[los-1]
if los >=2 and first_char is last_char:
count = count+1
else:
count = count
print count
return
def main():
print 'match_ends'
match_ends(['aba', 'xyz', 'aa', 'x', 'bbb'])
I keep on Getting answer as 1 all the time, I think it is not looping entirely. Where is the error
another more concise way to do this is just:
sum(1 for s in words if len(s) > 1 and s[0] == s[-1])
I would use the == operator to compare the characters instead of the is keyword. Also you can use the [-1] index to slice from the back to get the last character instead of essentially doing [len-1]. You are also resetting the count to 0 at the beginning of each loop (also count is already a function name, try to avoid naming a variable with the same name)
That being said, here is the same idea with a few changes for readability and fixes for the above.
def matches(words):
total = 0
for word in words:
if (len(word) > 1) and (word[0] == word[-1]):
total += 1
return total
>>> matches(['aba', 'xyz', 'aa', 'x', 'bbb'])
3
The reason is that you need to place count = 0 before the line for items in words:
I am required to input a string, calculate the number of vowels in that string, and then calculate the most and least occurring vowels. When the string contains no vowels at all, a message should print saying "no vowels were entered". Where there are vowels which occur the same number of times (e.g. if a and e both occur twice ), these vowels should both appear as being most or least occurring. However, if some vowels are in the string but not all, then the ones which do not appear in the sting should be ignored (rather than printing "a=0"). I think the counting part at the start is correct to an extent, but not quite. As for the most/least occurrences, I don't even know where to begin. Any help would be appreciated!
myString = str(input("Please type a sentence: ").lower())
count = [0, 0, 0, 0, 0]
for vowel in myString:
if vowel == "a" :
count[0]=count[0]+1
if vowel == "e" :
count[1]=count[1]+1
if vowel == "i" :
count[2]=count[2]+1
if vowel == "o" :
count[3]=count[3]+1
if vowel == "u" :
count[4]=count[4]+1
while count[0] > 0:
print ("acount :",count[0])
break
while count[1] > 0:
print ("ecount :",count[1])
break
while count[2] > 0:
print ("icount :",count[2])
break
while count[3] > 0:
print ("ocount :",count[3])
break
while count[4] > 0:
print ("ucount :",count[4])
break
else:
if count[0] == 0 and count[1] == 0 and count[2] == 0 and count[3] == 0 and count[4] == 0:
print ("No vowels were found")
from collections import Counter
d = Counter(input("Enter Sentence:"))
print sorted("aeiou",key=lambda x:d.get(x,0))
seems like a much easier way to do it ...
Well, you've got a list, count, with 5 counts in it. How do you find out which count is the highest? Just call max on it:
>>> count = [7, 1, 3, 10, 2]
>>> max(count)
10
Now that you know the max is 10, how do you know which letters have counts of 10?
>>> max_count = max(count)
>>> for i, n in enumerate(count):
... if n == max_count:
... # use i the same way you use it above
You should be able to figure out how to do the minimum count as well.
But there's one extra problem for minimum count: it sounds like you want the minimum that's not 0, not the absolute minimum. I'd write it like this:
>>> min_count = min(x for x in count if x>0)
… or, maybe more compactly:
>>> min_count = min(filter(bool, count))
But I'm guessing you don't understand comprehensions yet. In which case, you'll need to explicitly loop over the values, keeping track of the minimum value(s) that aren't 0. This implementation of max should help guide you in the right direction:
def my_max(iterable):
max_value = None
for value in iterable:
if max_value is None or value > max_value:
max_value = value
return max_value
All that being said, this is one of many cases where using the right data structure makes the job a lot easier. For example, if you used a dictionary instead of a list, you could replace the whole first half of your code with this:
count = dict.from_keys('aeiou', 0)
for vowel in myString:
if vowel in 'aeiou':
count[vowel] += 1
Using a defaultdict or a Counter makes it even easier; then you don't need to explicitly initialize the counts to 0.
count=[0,0,0,0,0]
myString = str(input("Please type a sentence: ").lower())
for x in mystring:
flag = 'aeiou'.find(x)
if flag>=0:
count[flag] +=1
print max(count)
here find function will try to find the 'x' from aeiou if found return position of 'x` else return -1. so in flag i will get the position else -1
Write a Python script that asks the user to enter two DNA
sequences with the same length. If the two sequences have
different lengths then output "Invalid input, the length must
be the same!" If inputs are valid, then compute how many dna bases at the
same position are equal in these two sequences and output the
answer "x positions of these two sequences have the same
character". x is the actual number, depending on the user's
input.
Below is what I have so far.
g=input('Enter DNA Sequence: ')
h=input('Enter Second DNA Sequence: ')
i=0
count=0
if len(g)!=len(h):
print('Invalid')
else:
while i<=len(g):
if g[i]==h[i]:
count+=1
i+=1
print(count)
Do this in your while loop instead (choose better variable names in your actual code):
for i, j in zip(g, h):
if i == j:
count += 1
OR replace the loop entirely with
count = sum(1 for i, j in zip(g, h) if i == j)
This will fix your index error. In general, you shouldn't be indexing lists in python, but looping over them. If you really want to index them, the i <= len(g) was the problem... it should be changed to i < len(g).
If you wanted to be really tricky, you could use the fact that True == 1 and False == 0:
count = sum(int(i == j) for i, j in zip(g, h))
The issue here is your loop condition. Your code gives you an IndexError; this means that you tried to access a character of a string, but there is no character at that index. What it means here is that i is greater than the len(g) - 1.
Consider this code:
while i<=len(g):
print(i)
i+=1
For g = "abc", it prints
0
1
2
3
Those are four numbers, not three! Since you start from 0, you must omit the last number, 3. You can adjust your condition as such:
while i < len(g):
# do things
But in Python, you should avoid using while loops when a for-loop will do. Here, you can use a for-loop to iterate through a sequence, and zip to combine two sequences into one.
for i, j in zip(g, h):
# i is the character of g, and j is the character of h
if i != j:
count += 1
You'll notice that you avoid the possibility of index errors and don't have to type so many [i]s.
i<=len(g) - replace this with i<len(g), because index counting starts from 0, not 1. This is the error you are facing. But in addition, your code is not very pretty...
First way to simplify it, keeping your structure:
for i in range(len(g)):
if g[i]==h[i]:
count+=1
Even better, you can actually make it a one-liner:
sum(g[i]==h[i] for i in range(len(g)))
Here the fact that True is evaluated to 1 in Python is used.
g = raw_input('Enter DNA Sequence: ')
h = raw_input('Enter Second DNA Sequence: ')
c = 0
count = 0
if len(g) != len(h):
print('Invalid')
else:
for i in g:
if g[c] != h[c]:
print "string does not match at : " + str(c)
count = count + 1
c = c + 1
print(count)
if(len(g)==len(h)):
print sum([1 for a,b in zip(g,h) if a==b])
Edit: Fixed the unclosed parens. Thanks for the comments, will definitely look at the generator solution and learn a bit - thanks!