why Im having String indexing problem in Python - python

I'm trying to understand why I'm having the same index again when I apply .index or .find
why I'm getting the same index '2' again why not '3'? when a letter is repeated, and what is the alternative way to get an index 3 for the second 'l'
text = 'Hello'
for i in text:
print(text.index(i))
the output is:
0
1
2
2
4

It's because .index() returns the lowest or first index of the substring within the string. Since the first occurrence of l in hello is at index 2, you'll always get 2 for "hello".index("l").
So when you're iterating through the characters of hello, you get 2 twice and never 3 (for the second l). Expanded into separate lines, it looks like this:
"hello".index("h") # = 0
"hello".index("e") # = 1
"hello".index("l") # = 2
"hello".index("l") # = 2
"hello".index("o") # = 4
Edit: Alternative way to get all indices:
One way to print all the indices (although not sure how useful this is since it just prints consecutive numbers) is to remove the character you just read from the string:
removed = 0
string = "hello world" # original string
for char in string:
print("{} at index {}".format(char, string.index(char) + removed)) # index is index() + how many chars we've removed
string = string[1:] # remove the char we just read
removed +=1 # increment removed count

text = 'Hello'
for idx, ch in enumerate(text):
print(f'char {ch} at index {idx}')
output
char H at index 0
char e at index 1
char l at index 2
char l at index 3
char o at index 4

If you want to find the second occurance, you should search in the substring after the first occurance
text = 'Hello'
first_index = text.index('l')
print('First index:', first_index)
second_index = text.index('l', first_index+1) # Search in the substring after the first occurance
print('Second index:', second_index)
The output is:
First index: 2
Second index: 3

Related

Flipping a matrix-like string horizontally

The goal of this function is to flip a matrix-like string horizontally.
For example the string: '100010001' with 2 rows and three columns would look like:
1 0 0
0 1 0
0 0 1
but when flipped should look like:
0 0 1
0 1 0
1 0 0
So the function would return the following output:
'001010100'
The caveat, I cannot use lists or arrays. only strings.
The current code I have written up, I believe, should work, however it is returning an empty string.
def flip_horizontal(image, rows, column):
horizontal_image = ''
for i in range(rows):
#This should slice the image string, and map image(the last element in the
#column : to the first element of the column) onto horizontal_image.
#this will repeat for the given amount of rows
horizontal_image = horizontal_image + image[(i+1)*column-1:i*column]
return horizontal_image
Again this returns an empty string. Any clue what the issue is?
Use [::-1] to reverse each row of the image.
def flip(im, w):
return ''.join(im[i:i+w][::-1] for i in range(0, len(im), w))
>>> im = '100010001'
>>> flip(im, 3)
'001010100'
The range function can be used to isolate your string into steps that represent rows. While iterating through the string you can use [::-1] to reverse each row to achieve the horizontal flip.
string = '100010001'
output = ''
prev = 0
# Iterate through string in steps of 3
for i in range(3, len(string) + 1, 3):
# Isolate and reverse row of string
row = string[prev:i]
row = row[::-1]
output = output + row
prev = i
Input:
'100
010
001'
Output:
'001
010
100'

How to count the number of triplets found in a string?

string1 = "abbbcccd"
string2 = "abbbbdccc"
How do I find the number of triplets found in a string. Triplet meaning a character that appears 3 times in a row. Triplets can also overlap for example in string 2 (abbbbdccc)
The output should be:
2 < -- string 1
3 <-- string 2
Im new to python and stack overflow so any help or advice in question writing would be much appreciated.
Try iterating through the string with a while loop, and comparing if the character and the two other characters in front of that character are the same. This works for overlap as well.
string1 = "abbbcccd"
string2 = "abbbbdccc"
string3 = "abbbbddddddccc"
def triplet_count(string):
it = 0 # iterator of string
cnt = 0 # count of number of triplets
while it < len(string) - 2:
if string[it] == string[it + 1] == string[it + 2]:
cnt += 1
it += 1
return cnt
print(triplet_count(string1)) # returns 2
print(triplet_count(string2)) # returns 3
print(triplet_count(string3)) # returns 7
This simple script should work...
my_string = "aaabbcccddddd"
# Some required variables
old_char = None
two_in_a_row = False
triplet_count = 0
# Iterates through characters of a given string
for char in my_string:
# Checks if previous character matches current character
if old_char == char:
# Checks if there already has been two in a row (and hence now a triplet)
if two_in_a_row:
triplet_count += 1
two_in_a_row = True
# Resets the two_in_a_row boolean variable if there's a non-match.
else:
two_in_a_row = False
old_char = char
print(triplet_count) # prints 5 for the example my_string I've given

How do i check the 2 item of every string for more than one occurence

[‘AC’, ‘2H’, ‘3S’, ‘4C’]
How do I check if the 1st index (eg 2nd element) of every string occurs more than once? For example, in this case, C occurs 2 times so I need to return False
This must apply to other case as well such as H or S occuring more than once
Consider using collections.Counter to count the occurrences of interested items. And use all or any to verify the condition.
import collections
a = ['AC', '2H', '3S', '4C']
counter = collections.Counter(s[1] for s in a)
result = all(v < 2 for v in counter.values())
print(result)
You can use this function:
def check_amount(all_text, character):
count = 0
for text in all_text:
for ch in text:
if ch == character:
count += 1
return count
This returns how many times it happens, if you just want to see if it exists:
def check_amount(all_text, character):
for text in all_text:
for ch in text:
if ch == character:
return True
else:
return False
Those are for checking at any position this is if you need it to be at a specific position like you said:
def check_amount(all_text, character):
count = 0
for text in all_text:
if text[1] == character:
count += 1
return count
And then you can change this if you want the boolean version using the same method of not using the count
The all_text is the list you want to pass in, and the character you want to see if is there/exists.
Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:
>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
print('ll found', m.start(), m.end())
ll found 1 3
ll found 10 12
ll found 16 18
Alternatively, if you don't want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:
>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
index = text.find('ll', index)
if index == -1:
break
print('ll found at', index)
index += 2 # +2 because len('ll') == 2
ll found at 1
ll found at 10
ll found at 16
This also works for lists and other sequences.
for an array here I'd use List Comprehension, like this:
listOfElems = ['Hello', 'Ok', 'is', 'Ok', 'test', 'this', 'is', 'a', 'test', 'Ok']
now let's find all indexes of 'ok' in the list
# Use List Comprehension Get indexes of all occurrences of 'Ok' in the list
indexPosList = [ i for i in range(len(listOfElems)) if listOfElems[i] == 'Ok' ]
print('Indexes of all occurrences of "Ok" in the list are: ', indexPosList)
output:
Indexes of all occurrences of "Ok" in the list are : [1, 3, 9]

extract substring pattern

I have long file like 1200 sequences
>3fm8|A|A0JLQ2
CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP
QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA
LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL
I want to read each possible pattern has cysteine in middle and has in the beginning five string and follow by other five string such as xxxxxCxxxxx
the output should be like this:
QDIQLCGMGIL
ILPEHCIIDIT
TISDNCVVIFS
FSKTSCSYCTM
this is the pogram only give position of C . it is not work like what I want
pos=[]
def find(ch,string1):
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos
z=find('C','AWERQRTCWERTYCTAAAACTTCTTT')
print z
You need to return outside the loop, you are returning on the first match so you only ever get a single character in your list:
def find(ch,string1):
pos = []
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos # outside
You can also use enumerate with a list comp in place of your range logic:
def indexes(ch, s1):
return [index for index, char in enumerate(s1)if char == ch and 5 >= index <= len(s1) - 6]
Each index in the list comp is the character index and each char is the actual character so we keep each index where char is equal to ch.
If you want the five chars that are both sides:
In [24]: s="CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP"
In [25]: inds = indexes("C",s)
In [26]: [s[i-5:i+6] for i in inds]
Out[26]: ['QDIQLCGMGIL', 'ILPEHCIIDIT']
I added checking the index as we obviously cannot get five chars before C if the index is < 5 and the same from the end.
You can do it all in a single function, yielding a slice when you find a match:
def find(ch, s):
ln = len(s)
for i, char in enumerate(s):
if ch == char and 5 <= i <= ln - 6:
yield s[i- 5:i + 6]
Where presuming the data in your question is actually two lines from yoru file like:
s="""">3fm8|A|A0JLQ2CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDALYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCY"""
Running:
for line in s.splitlines():
print(list(find("C" ,line)))
would output:
['0JLQ2CFLVNL', 'QDIQLCGMGIL', 'ILPEHCIIDIT']
['TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']
Which gives six matches not four as your expected output suggest so I presume you did not include all possible matches.
You can also speed up the code using str.find, starting at the last match index + 1 for each subsequent match
def find(ch, s):
ln, i = len(s) - 6, s.find(ch)
while 5 <= i <= ln:
yield s[i - 5:i + 6]
i = s.find(ch, i + 1)
Which will give the same output. Of course if the strings cannot overlap you can start looking for the next match much further in the string each time.
My solution is based on regex, and shows all possible solutions using regex and while loop. Thanks to #Smac89 for improving it by transforming it into a generator:
import re
string = """CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL"""
# Generator
def find_cysteine2(string):
# Create a loop that will utilize regex multiple times
# in order to capture matches within groups
while True:
# Find a match
data = re.search(r'(\w{5}C\w{5})',string)
# If match exists, let's collect the data
if data:
# Collect the string
yield data.group(1)
# Shrink the string to not include
# the previous result
location = data.start() + 1
string = string[location:]
# If there are no matches, stop the loop
else:
break
print [x for x in find_cysteine2(string)]
# ['QDIQLCGMGIL', 'ILPEHCIIDIT', 'TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']

Find the position of when a particular sequence occurs in a string using a sliding window in Python 2.7

If there's a window list
text='abcdefg'
window_list=[text[i:i+3] for i in range(len(text)-3)]
print window_list
['abc', 'bcd', 'cde', 'def']
for i in window_list:
for j,k in zip(range(len(text)),i):
print j,k
0 a
1 b
2 c
0 b
1 c
2 d
0 c
1 d
2 e
0 d
1 e
2 f
i'm trying to make it so when
(j==0 and k=='c') and (j==1 and k=='d') and (j==2 and k=='e')
it would give me the starts and ending position where that occurs on the string text
so it would give me
[2-4]
Have you thought to do it in this way?
>>> text='abcdefg'
>>> window_list=[text[i:i+3] for i in range(len(text)-3)]
>>> ["-".join([str(i),str(i+len(w))]) for i,w in enumerate(window_list) if w == 'cde'] #for single item
['2-5']
>>> ["-".join([str(i),str(i+len(w))]) for i,w in enumerate(window_list) if w in ['cde','def']] # for multiple items
['2-5', '3-6']
>>>
Note: enumerate the list and search for those items which matches the condition. Return the index followed by the end position (which is index + length of the sub-sequence). Please note, the result would be a string rather than what you are expecting.
import re
seq = 'abcdefabcdefabcdefg'
for match in re.finditer('abc', seq):
print match.start(), match.end()
Logic:
All you need to do is find if your pattern is in the doc or not. This can be done easily using Python in built string "find" function. Once you find the start position of your string, then all you need to do is add the length of your pattern to get end position. thats it! job done ;)
The Code:
text = "abcdefghifjklmnopqrstuvwxyz"
start_position = text.find("abc")
if(start_position>-1):
end_position = start_position+len(start_position) - 1
else:
print "Pattern not found"
print start_position, "-", end_position
The Output:
0 - 2
Reference:
Check Official Python String Functions Documentation

Categories