Comparing characters in a string in Python - python

I want to compare following chars in a string and if they're equal, raise the counter.
With my example code I always get TypErrors related to line 6.
Do you know where's the problem?
Thank you!
def func(text):
counter = 0
text_l = text.lower()
for i in text_l:
if text_l[i+1] == text_l[i]:
print(text_l[i+1], text_l[i])
counter += 1
return counter

i is not an index. Your for will iterate over the elements directly, so i at any point of time is a character, not an integer. Use the range function if you want the indices:
for i in range(len(text_l) - 1): # i is the current index
if text_l[i + 1] == text_l[i]:
You can also use enumerate:
for i, c in enumerate(text_l[:-1]): # i is the current index, c is the current char
if text_l[i + 1] == c:
In either case, you'll want to iterate until the penultimate character because, you'll hit an IndexError on the last iteration with i + 1 as i + 1 is out of bounds for the last character.

Related

Always have this error 'IndexError: string index out of range', when i have taken into consideration of index range

Need to write a program that prints the longest substring of variable, in which the letters occur in alphabetical order.
eg. s = 'onsjdfjqiwkvftwfbx', it should returns 'dfjq'.
as a beginner, code written as below:
y=()
z=()
for i in range(len(s)-1):
letter=s[i]
while s[i]<=s[i+1]:
letter+=s[i+1]
i+=1
y=y+(letter,)
z=z+(len(letter),)
print(y[z.index(max(z))])
However, above code will always return
IndexError: string index out of range.
It will produce the desired result until I change it to range(len(s)-3).
Would like to seek advice on:
Why range(len(s)-1) will lead to such error message? In order to take care of index up to i+1, I have already reduce the range value by 1.
my rationale is, if the length of variable s is 14, it has index from 0-13, range(14) produce value 0-13. However as my code involves i+1 index, range is reduced by 1 to take care of this part.
How to amend above code to produce correct result.
if s = 'abcdefghijklmnopqrstuvwxyz', above code with range(len(s)-3) returns IndexError: string index out of range again. Why? what's wrong with this code?
Any help is appreciated~
Te reason for the out of range index is that in your internal while loop, you are advancing i without checking for its range. Your code is also very inefficient, as you have nested loops, and you are doing a lot of relatively expensive string concatenation. A linear time algorithm without concatenations would look something like this:
s = 'onsjdfjqiwkvftwfbcdefgxa'
# Start by assuming the longest substring is the first letter
longest_end = 0
longest_length = 1
length = 1
for i in range(1, len(s)):
if s[i] > s[i - 1]:
# If current character higher in order than previous increment current length
length += 1
if length > longest_length:
# If current length, longer than previous maximum, remember position
longest_end = i + 1
longest_length = length
else:
# If not increasing order, reset current length
length = 1
print(s[longest_end - longest_length:longest_end])
Regarding "1":
Actually, using range(len(s)-2) should also work.
The reason range(len(s)-1) breaks:
For 'onsjdfjqiwkvftwfbx', the len() will be equal to 18. Still, the max index you can refer is 17 (since indexing starts at 0).
Thus, when you loop through "i"'s, at some point, i will increase to 17 (which corresponds to len(s)-1) and then try access s[i+1] in your while comparison (which is impossible).
Regarding "2":
The following should work:
current_output = ''
biggest_output = ''
for letter in s:
if current_output == '':
current_output += letter
else:
if current_output[-1]<=letter:
current_output += letter
else:
if len(current_output) > len(biggest_output):
biggest_output = current_output
current_output = letter
if len(current_output) > len(biggest_output):
biggest_output = current_output
print(biggest_output)

How do I count up the number of letter pairs in python?

I am making a program in python that count up the number of letter pairs.
For example ------> 'ddogccatppig' will print 3, 'a' will print 0, 'dogcatpig' will print 0, 'aaaa' will print 3, and 'AAAAAAAAAA' will print 9.
My teacher told me to use a for loop to get the i and i+1 index to compare. I do not know how to do this, and I am really confused. My code:
def count_pairs( word ):
pairs = 0
chars = set(word)
for char in chars:
pairs += word.count(char + char)
return pairs
Please help me!!! Thank you.
The for loop is only to iterate through the appropriate values of i, not directly to do the comparison. You need to start i at 0, and iterate through i+1 being the last index in the string. Work this out on paper.
Alternately, use i-1 and i; then you want to start i-1 at 0, which takes less typing:
for i in range(1, len(word)):
if word[i] == word[i-1]:
...
Even better, don't use the counter at all -- make a list of equality results and count the True values:
return sum([word[i] == word[i-1] for i in range(1, len(word))])
This is a bit of a "dirty trick" using the fact that True evaluates as 1 and False as 0.
If you want to loop over indices instead of the actual characters, you can do:
for i in range(len(word)):
# do something with word[i] and/or word[i+1] or word[i-1]
Converting the string to a set first is counterproductive because it removes the ordering and the duplicates, making the entire problem impossible for two different reasons. :)
Here is an answer:
test = "ddogccatppig"
def count_pairs(test):
counter = 0
for i in range(0,len(test)-1):
if test[i] == test[i+1]
counter+=1
return counter
print(count_pairs(test))
Here you iterate through the length of the string (minus 1 because otherwise you will get an index out of bounds exception). Add to a counter if the letter is the same as the one in front and return it.
This is another (similar) way to get the same answer.
def charPairs(word):
word = list(word)
count = 0
for n in range(0, len(word)-1):
if word[n] == word[n+1]:
count +=1
return count
print(charPairs("ddogccatppig"))
print(charPairs("a"))
print(charPairs("dogcatpig"))
print(charPairs("aaaa"))
print(charPairs("AAAAAAAAAA"))

Taking long time to execute Python code for the definition

This is the problem definition:
Given a string of lowercase letters, determine the index of the
character whose removal will make a palindrome. If is already a
palindrome or no such character exists, then print -1. There will always
be a valid solution, and any correct answer is acceptable. For
example, if "bcbc", we can either remove 'b' at index or 'c' at index.
I tried this code:
# !/bin/python
import sys
def palindromeIndex(s):
# Complete this function
length = len(s)
index = 0
while index != length:
string = list(s)
del string[index]
if string == list(reversed(string)):
return index
index += 1
return -1
q = int(raw_input().strip())
for a0 in xrange(q):
s = raw_input().strip()
result = palindromeIndex(s)
print(result)
This code works for the smaller values. But taken hell lot of time for the larger inputs.
Here is the sample: Link to sample
the above one is the bigger sample which is to be decoded. But at the solution must run for the following input:
Input (stdin)
3
aaab
baa
aaa
Expected Output
3
0
-1
How to optimize the solution?
Here is a code that is optimized for the very task
def palindrome_index(s):
# Complete this function
rev = s[::-1]
if rev == s:
return -1
for i, (a, b) in enumerate(zip(s, rev)):
if a != b:
candidate = s[:i] + s[i + 1:]
if candidate == candidate[::-1]:
return i
else:
return len(s) - i - 1
First we calculate the reverse of the string. If rev equals the original, it was a palindrome to begin with. Then we iterate the characters at the both ends, keeping tab on the index as well:
for i, (a, b) in enumerate(zip(s, rev)):
a will hold the current character from the beginning of the string and b from the end. i will hold the index from the beginning of the string. If at any point a != b then it means that either a or b must be removed. Since there is always a solution, and it is always one character, we test if the removal of a results in a palindrome. If it does, we return the index of a, which is i. If it doesn't, then by necessity, the removal of b must result in a palindrome, therefore we return its index, counting from the end.
There is no need to convert the string to a list, as you can compare strings. This will remove a computation that is called a lot thus speeding up the process. To reverse a string, all you need to do is used slicing:
>>> s = "abcdef"
>>> s[::-1]
'fedcba'
So using this, you can re-write your function to:
def palindromeIndex(s):
if s == s[::-1]:
return -1
for i in range(len(s)):
c = s[:i] + s[i+1:]
if c == c[::-1]:
return i
return -1
and the tests from your question:
>>> palindromeIndex("aaab")
3
>>> palindromeIndex("baa")
0
>>> palindromeIndex("aaa")
-1
and for the first one in the link that you gave, the result was:
16722
which computed in about 900ms compared to your original function which took 17000ms but still gave the same result. So it is clear that this function is a drastic improvement. :)

Why my 2nd method is slower than my 1st method?

I was doing leetcode problem No. 387. First Unique Character in a String. Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
Examples:
s = "leetcode"
return 0.
s = "loveleetcode",
return 2.
I wrote 2 algorithm:
Method 1
def firstUniqChar(s):
d = {}
L = len(s)
for i in range(L):
if s[i] not in d:
d[s[i]] = [i]
else:
d[s[i]].append(i)
M = L
for k in d:
if len(d[k])==1:
if d[k][0]<M:
M = d[k][0]
if M<L:
return M
else:
return -1
This is very intuitive, i.e., first create a count dictionary by looping over all the char in s (this can also be done using one line in collections.Counter), then do a second loop only checking those keys whose value is a list of length 1. I think as I did 2 loops, it must have some redundant computation. So I wrote the 2nd algorithm, which I think is better than the 1st one but in the leetcode platform, the 2nd one runs much slower than the 1st one and I couldn't figure out why.
Method 2
def firstUniqChar(s):
d = {}
L = len(s)
A = []
for i in range(L):
if s[i] not in d:
d[s[i]] = i
A.append(i)
else:
try:
A.remove(d[s[i]])
except:
pass
if len(A)==0:
return -1
else:
return A[0]
The 2nd one just loop once for all char in s
Your first solution is O(n), but your second solution is O(n^2), as method A.remove is looping over elements of A.
As others have said - using list.remove is quite expensive... Your use of collections.Counter is a good idea.
You need to scan the string once to find uniques. Then probably what's better is to sequentially scan it again and take the index of the first unique - that makes your potential code:
from collections import Counter
s = "loveleetcode"
# Build a set of unique values
unique = {ch for ch, freq in Counter(s).items() if freq == 1}
# re-iterate over the string until we first find a unique value or
# not - default to -1 if none found
first_index = next((ix for ix, ch in enumerate(s) if ch in unique), -1)
# 2

Longest string in alphabetic order

This code works fine for all the strings, except for the ones where the last char is needed.
s='abcdefghijklmnopqrstuvwxyz'
sub =''
test =s[0]
for n in range(len(s)-1):
if len(test) > len(sub):
sub = test
if s[n] >= s[n-1]:
test += s[n]
else:
test = s[n]
print 'Longest substring in alphabetic order is: ' + str(sub)
How do you suggest a possibility for doing this?
Thank you in advance guys!
PS:
Thanks for the answers so far. The problem is that, no matter which range I type, the sub variable, which I will print, doesn't get all the chars I want. The loop finishes before :\ Maybe it's a problem with the program itself.
Any extra tips? :)
Your problem is with range(len(s)-1). Range generates a list up to its upper limit parameter value-1 so you don't need to subtract1 to len(s), use:
range(len(s))
From https://docs.python.org/2/library/functions.html#range
range(stop)
range(start, stop[, step])
This is a versatile function to create lists containing arithmetic progressions. It is most often used in for loops. The arguments must
be plain integers. If the step argument is omitted, it defaults to 1.
If the start argument is omitted, it defaults to 0. The full form
returns a list of plain integers [start, start + step, start + 2 *
step, ...]. If step is positive, the last element is the largest start
+ i * step less than stop; if step is negative, the last element is the smallest start + i * step greater than stop. step must not be zero
On the other hand, you are labeling your question as python2.7 so I assume you are using 2.7. If that is the case, it is more efficient to use xrange instead range because that way you will use an iterator instead generating a list.
EDIT
From the comments to this question, you can change your code to:
s='caabcdab'
sub =''
test =s[0]
for i in range(1,len(s)+1):
n = i%len(s)
if len(test) > len(sub):
sub = test
if i >= len(s):
break
if s[n] >= s[n-1]:
test += s[n]
else:
test = s[n]
print 'Logest substring in alphabetic order is: ' + str(sub)
you enemurate instead of range:
s='abcdefghijklmnopqrstuvwxyz'
sub =''
test =s[0]
for n,value in enumerate(s):
if len(test) > len(sub):
sub = test
if value >= s[n-1]:
test += s[n]
else:
test = s[n]
You could do use the following code:
s = 'abcdefgahijkblmnopqrstcuvwxyz'
sub = ''
test = s[0]
for index, character in enumerate(s):
if index > 0:
if character > s[index - 1]:
test += character
else:
test = character
if len(test) > len(sub):
sub = test
print 'Longest substring in alphabetic order is: ' + str(sub)
A few pointers too.
Python strings are iterable. i.e you can loop through them.
Use enumerate when you want the index of the list while iterating it.

Categories