Coming from other languages, I know how to compare string indices to test for equality. But in Python I get the following error when trying to compare indices within a string.
TypeError: string indices must be integers
How can the indices of a string be compared for equality?
program.py
myString = 'aabccddde'
for i in myString:
if myString[i] == myString[i + 1]:
print('match')
Python for loops are sometimes known as "foreach" loops in other languages. You are not looping over indices, you are looping over the characters of the string. Strings are iterable and produce their characters when iterated over.
You may find this answer and part of this answer helpful to understand how exactly a for loop in Python works.
Regarding your actual problem, the itertools documentation has a recipe for this called pairwise. You can either copy-paste the function or import it from more_itertools (which needs to be installed).
Demo:
>>> # copy recipe from itertools docs or import from more_itertools
>>> from more_itertools import pairwise
>>> myString = 'aabccddde'
>>>
>>> for char, next_char in pairwise(myString):
... if char == next_char:
... print(char, next_char, 'match')
...
a a match
c c match
d d match
d d match
In my opinion, we should avoid explicitly using indices when iterating whenever possible. Python has high level abstractions that allow you to not get sidetracked by the indices most of the time. In addition, lots of things are iterable and can't even be indexed into using integers. The above code works for any iterable being passed to pairwise, not just strings.
In this phrase:
for i in myString:
You are iterating over single letters. So myString[i] means 'aabccddde'["a"] in the first iteration, what is of course not possible.
If you would like to save order of letters and letters, you may use "enumerate" or "len":
myString = 'aabccddde'
for i in range(len(myString)-2):
if myString[i] == myString[i + 1]:
print('match')
You can use enumerate:
myString = 'aabccddde'
l = len(myString)
for i,j in enumerate(myString):
if i == l-1: # This if block will prevent the error message for last index
break
if myString[i] == myString[i + 1]:
print('match')
Enumerate will help iterate over each character and the index of the character in the string:
myString = 'aabccddde'
for idx, char in enumerate(myString, ):
# guard against IndexError
if idx+1 == len(myString):
break
if char == myString[idx + 1]:
print('match')
Related
I implemented string contraction which lists the characters of a string with their respective counts. So, for example, the string "aadddza" becomes "a2d3z1a1". Can you offer suggestions on making what I have Pythonic?
def string_contraction(input_string):
if type(input_string) != str:
print("Not a string")
return
input_string = str.lower(input_string)
prev = input_string[0]
s = ""
i = 0
for lett in input_string:
if lett == prev:
i += 1
else:
s += prev+str(i)
prev = lett
i = 1
s += lett+str(i)
return(s)
The itertools.groupby function can do the "hard part" for you. Here's an example ("rle" is short for "run length encoding"):
def rle(s):
from itertools import groupby
return "".join(letter + str(len(list(group)))
for letter, group in groupby(s))
Then:
>>> rle("aadddza")
'a2d3z1a1'
It may take a while staring at the docs to figure out how this works - groupby() is quite logical, but not simple. In particular, the second element of each two-tuple it generates is not a list of matching elements from the original iterable, but is itself an iterable that generates a sequence of matching elements from the original iterable. Your application only cares about the number of matching elements. So the code uses list(group) to turn the iterable into a list, and then applies len() to that to get the number.
my_string = "aaabbcccaa"
required_output = "a3b2c3a2"
Please do not provide the code, I would like to try it myself, Please suggest me an approach with which i can get the required output.
As suggested by #ggradnig I've tried the below code.
def count_string(s):
current_char= s[0]
counter =0
result_string = ''
for i in range(len(s)):
if s[i] == current_char:
counter+=1
if s[i] != current_char:
result_string=result_string+current_char+str(counter)
current_char = s[i]
counter = 1
continue
result_string=result_string+current_char+str(counter)
return result_string
given_string=count_string("aabbbccccaa")
print(given_string)
Please suggest changes to improve the above code
Declare a variable for the current character you are counting and a counter variable. Set the "current character" to the first character of the string and the counter to 0. Also, declare an empty result String.
Now iterate over each character of the input String and check if it equals your "current character". If no, append this character to your result String, followed by the current value of the counter. Also, set the counter to 1 after appending and the "current character" to the character you are currently iterating over (i.e. the next character). If the character your currently iterating equals your "current character", increase your counter.
my_string = "aaabbcccaa" required_output= a3b2c3a2
You are counting consecutive characters, if the subsequent character is same, you increment a count variable, if the subsequent char is different, you move on and reinitialize count variable. Append count to the encountered character. When appending integer to a string you will have to typecast. Time complexity is O(n)
Use hashing.. with dictionaries in python. Make the character key and its occurrence a value.. And then traverse the dictionary... and took a empty string and append the both key and its value to the string.
EDIT
Now i think to change my approach instead of using sets under Dictionary and making things complex we can use simple approach. This program is working.
string2=""
string=input()
i=0
while(i<len(string)):
count=1
for j in range(i+1,len(string)):
if string[i]==string[j]:
count+=1
else:
break
string2=string2+string[i]
string2=string2+str(count)
i=i+count
print(string2)
The function groupby() in the module itertools makes this trivial. Read it's documentation where it shows the example (in comments):
>>> [list(g) for k, g in groupby('AAAABBBCCD')]
[['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['C', 'C'], ['D']]
From here you simply combine the first element of each sublist with the length of that sublist and you have your answer. Some string conversion and joining lists on empty strings necessary but basically a one-liner.
I don't understand why folks recommend the str.count() method. It combines two 'a' runs into one count which isn't desired and once you've broken the string into same letter substrings, len() can do the job.
Please suggest changes to improve the above code
from itertools import groupby
string = 'aaabbcccaa'
def count_string(s):
return ''.join(x[0] + str(len(x)) for x in (list(g) for _, g in groupby(s)))
print(string, '->', count_string(string))
Or if you wish to stay with your current algorithm, I suggest at least the following code cleanup:
def count_string(s):
counter = 0
result_string = ''
current_char = s[0]
for c in s:
if c == current_char:
counter += 1
else:
result_string += current_char + str(counter)
current_char = c
counter = 1
return result_string + current_char + str(counter)
You should use a for loop and string "count" method
and if you want me to post the code also tell me
Python has two method that you might fine useful.
I think python has count method that you could use or there is a length method that you can pass your string to and it out the length of the string.
I have a string containing letters and numbers like this -
12345A6789B12345C
How can I get a list that looks like this
[12345A, 6789B, 12345C]
>>> my_string = '12345A6789B12345C'
>>> import re
>>> re.findall('\d*\w', my_string)
['12345A', '6789B', '12345C']
For the sake of completeness, non-regex solution:
data = "12345A6789B12345C"
result = [""]
for char in data:
result[-1] += char
if char.isalpha():
result.append("")
if not result[-1]:
result.pop()
print(result)
# ['12345A', '6789B', '12345C']
Should be faster for smaller strings, but if you're working with huge data go with regex as once compiled and warmed up, the search separation happens on the 'fast' C side.
You could build this with a generator, too. The approach below keeps track of start and end indices of each slice, yielding a generator of strings. You'll have to cast it to list to use it as one, though (splitonalpha(some_string)[-1] will fail, since generators aren't indexable)
def splitonalpha(s):
start = 0
for end, ch in enumerate(s, start=1):
if ch.isalpha:
yield s[start:end]
start = end
list(splitonalpha("12345A6789B12345C"))
# ['12345A', '6789B', '12345C']
>>> s.index("r")
>>> s.find("r")
The above only finds the first character. For example:
>>>s='hello'
>>>s.find('l')
only outputs 2. What if we want the position of both 'l' and want the output
>>>[2, 3]
You can use list comprehension, with enumerate, like this
>>> [index for index, char in enumerate(s) if char == "l"]
[2, 3]
The enumerate function will give the current index as well the current item in the iterable. So, in each iteration, you will get the index and the corresponding character in the string. We are checking of the character is l and if it is l, we include the index in the resulting list.
This is thefourtheye's answer expanded into a normal function:
def count_chars(string, char):
results = []
for index, value in enumerate(string):
if value == char:
results.append(index)
return results
However thefourtheye's answer is the best most pythonic answer. List comprehensions, and comprehensions in general, are a very important part of Python (and one of my favorite parts). I strongly suggest reading up on them here. Nothing wrong with being a little ahead of the curve :-).
thefourtheye's solution is the best but if you want a different method:
s = raw_input("Enter a string")
c = raw_input("Enter a char to search")
indexes = []
i = 0
while i < len(s):
if s[i] == c:
indexes.append(i)
i += 1
print "{} appears {} times at the following index/indexes {}".format(c,len(indexes),indexes)
I have a set of strings, e.g.
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
I simply want to find the longest common portion of these strings, here the prefix. In the above the result should be
my_prefix_
The strings
my_prefix_what_ever
my_prefix_what_so_ever
my_doesnt_matter
should result in the prefix
my_
Is there a relatively painless way in Python to determine the prefix (without having to iterate over each character manually)?
PS: I'm using Python 2.6.3.
Never rewrite what is provided to you: os.path.commonprefix does exactly this:
Return the longest path prefix (taken
character-by-character) that is a prefix of all paths in list. If list
is empty, return the empty string (''). Note that this may return
invalid paths because it works a character at a time.
For comparison to the other answers, here's the code:
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1
Ned Batchelder is probably right. But for the fun of it, here's a more efficient version of phimuemue's answer using itertools.
import itertools
strings = ['my_prefix_what_ever',
'my_prefix_what_so_ever',
'my_prefix_doesnt_matter']
def all_same(x):
return all(x[0] == y for y in x)
char_tuples = itertools.izip(*strings)
prefix_tuples = itertools.takewhile(all_same, char_tuples)
''.join(x[0] for x in prefix_tuples)
As an affront to readability, here's a one-line version :)
>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*strings)))
'my_prefix_'
Here's my solution:
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
prefix_len = len(a[0])
for x in a[1 : ]:
prefix_len = min(prefix_len, len(x))
while not x.startswith(a[0][ : prefix_len]):
prefix_len -= 1
prefix = a[0][ : prefix_len]
The following is an working, but probably quite inefficient solution.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
b = zip(*a)
c = [x[0] for x in b if x==(x[0],)*len(x)]
result = "".join(c)
For small sets of strings, the above is no problem at all. But for larger sets, I personally would code another, manual solution that checks each character one after another and stops when there are differences.
Algorithmically, this yields the same procedure, however, one might be able to avoid constructing the list c.
Just out of curiosity I figured out yet another way to do this:
def common_prefix(strings):
if len(strings) == 1:#rule out trivial case
return strings[0]
prefix = strings[0]
for string in strings[1:]:
while string[:len(prefix)] != prefix and prefix:
prefix = prefix[:len(prefix)-1]
if not prefix:
break
return prefix
strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]
print common_prefix(strings)
#Prints "my_prefix_"
As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.
The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.
Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.
val = ["axc", "abc", "abc"]
lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
out = val[0][:lot.index(None)]
Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.
a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]
print a[0][:list.index(0) if list.count(0) > 0 else len(list)]
output: my_prefix_
Here is another way of doing this using OrderedDict with minimal code.
import collections
import itertools
def commonprefix(instrings):
""" Common prefix of a list of input strings using OrderedDict """
d = collections.OrderedDict()
for instring in instrings:
for idx,char in enumerate(instring):
# Make sure index is added into key
d[(char, idx)] = d.get((char,idx), 0) + 1
# Return prefix of keys while value == length(instrings)
return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])
I had a slight variation of the problem and google sends me here, so I think it will be useful to document:
I have a list like:
my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter
some_noise
some_other_noise
So I would expect my_prefix to be returned. That can be done with:
from collections import Counter
def get_longest_common_prefix(values, min_length):
substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
counter = Counter(substrings)
# remove count of 1
counter -= Counter(set(substrings))
return max(counter, key=len)
In one line without using itertools, for no particular reason, although it does iterate through each character:
''.join([z[0] for z in zip(*(list(s) for s in strings)) if all(x==z[0] for x in z)])
Find the common prefix in all words from the given input string, if there is no common prefix print -1
stringList = ['my_prefix_what_ever', 'my_prefix_what_so_ever', 'my_prefix_doesnt_matter']
len2 = len( stringList )
if len2 != 0:
# let shortest word is prefix
prefix = min( stringList )
for i in range( len2 ):
word = stringList[ i ]
len1 = len( prefix )
# slicing each word as lenght of prefix
word = word[ 0:len1 ]
for j in range( len1 ):
# comparing each letter of word and prefix
if word[ j ] != prefix[ j ]:
# if letter does not match slice the prefix
prefix = prefix[ :j ]
break # after getting comman prefix move to next word
if len( prefix ) != 0:
print("common prefix: ",prefix)
else:
print("-1")
else:
print("string List is empty")