How to count the number of triplets found in a string? - python

string1 = "abbbcccd"
string2 = "abbbbdccc"
How do I find the number of triplets found in a string. Triplet meaning a character that appears 3 times in a row. Triplets can also overlap for example in string 2 (abbbbdccc)
The output should be:
2 < -- string 1
3 <-- string 2
Im new to python and stack overflow so any help or advice in question writing would be much appreciated.

Try iterating through the string with a while loop, and comparing if the character and the two other characters in front of that character are the same. This works for overlap as well.
string1 = "abbbcccd"
string2 = "abbbbdccc"
string3 = "abbbbddddddccc"
def triplet_count(string):
it = 0 # iterator of string
cnt = 0 # count of number of triplets
while it < len(string) - 2:
if string[it] == string[it + 1] == string[it + 2]:
cnt += 1
it += 1
return cnt
print(triplet_count(string1)) # returns 2
print(triplet_count(string2)) # returns 3
print(triplet_count(string3)) # returns 7

This simple script should work...
my_string = "aaabbcccddddd"
# Some required variables
old_char = None
two_in_a_row = False
triplet_count = 0
# Iterates through characters of a given string
for char in my_string:
# Checks if previous character matches current character
if old_char == char:
# Checks if there already has been two in a row (and hence now a triplet)
if two_in_a_row:
triplet_count += 1
two_in_a_row = True
# Resets the two_in_a_row boolean variable if there's a non-match.
else:
two_in_a_row = False
old_char = char
print(triplet_count) # prints 5 for the example my_string I've given

Related

How to find the most amount of shared characters in two strings? (Python)

yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings
Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5
This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5
s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5

why Im having String indexing problem in Python

I'm trying to understand why I'm having the same index again when I apply .index or .find
why I'm getting the same index '2' again why not '3'? when a letter is repeated, and what is the alternative way to get an index 3 for the second 'l'
text = 'Hello'
for i in text:
print(text.index(i))
the output is:
0
1
2
2
4
It's because .index() returns the lowest or first index of the substring within the string. Since the first occurrence of l in hello is at index 2, you'll always get 2 for "hello".index("l").
So when you're iterating through the characters of hello, you get 2 twice and never 3 (for the second l). Expanded into separate lines, it looks like this:
"hello".index("h") # = 0
"hello".index("e") # = 1
"hello".index("l") # = 2
"hello".index("l") # = 2
"hello".index("o") # = 4
Edit: Alternative way to get all indices:
One way to print all the indices (although not sure how useful this is since it just prints consecutive numbers) is to remove the character you just read from the string:
removed = 0
string = "hello world" # original string
for char in string:
print("{} at index {}".format(char, string.index(char) + removed)) # index is index() + how many chars we've removed
string = string[1:] # remove the char we just read
removed +=1 # increment removed count
text = 'Hello'
for idx, ch in enumerate(text):
print(f'char {ch} at index {idx}')
output
char H at index 0
char e at index 1
char l at index 2
char l at index 3
char o at index 4
If you want to find the second occurance, you should search in the substring after the first occurance
text = 'Hello'
first_index = text.index('l')
print('First index:', first_index)
second_index = text.index('l', first_index+1) # Search in the substring after the first occurance
print('Second index:', second_index)
The output is:
First index: 2
Second index: 3

remove substring from string using python

I would like to remove 2 last sub-strings from a string like the following example :
str="Dev.TTT.roker.{i}.ridge.{i}."
str1="Dev.TTT.roker.{i}.ridge.{i}.obj."
if in the last two strings between the dot . there is a {i} we have to remove it as well.
so the result of python script should be loke this :
the expected result for str is : Dev.TTT.
the expected result for str1 is : Dev.TTT.roker.{i}.
you can simply split by . and ignore empty string or {i}.
Also do not use keyword as variable. In your case dont use str as variable name.
def solve(s):
x = s.split('.')
cnt = 2
l = len(x) - 1
while cnt and l:
if x[l] == '' or x[l] == '{i}':
l -= 1
continue
else:
cnt -= 1
l -= 1
return '.'.join(x[:l+1]) + '.'
str1="Dev.TTT.roker.{i}.ridge.{i}."
str2="Dev.TTT.roker.{i}.ridge.{i}.obj."
print(solve(str1))
print(solve(str2))
output:
Dev.TTT.
Dev.TTT.roker.{i}.

extract substring pattern

I have long file like 1200 sequences
>3fm8|A|A0JLQ2
CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP
QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA
LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL
I want to read each possible pattern has cysteine in middle and has in the beginning five string and follow by other five string such as xxxxxCxxxxx
the output should be like this:
QDIQLCGMGIL
ILPEHCIIDIT
TISDNCVVIFS
FSKTSCSYCTM
this is the pogram only give position of C . it is not work like what I want
pos=[]
def find(ch,string1):
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos
z=find('C','AWERQRTCWERTYCTAAAACTTCTTT')
print z
You need to return outside the loop, you are returning on the first match so you only ever get a single character in your list:
def find(ch,string1):
pos = []
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos # outside
You can also use enumerate with a list comp in place of your range logic:
def indexes(ch, s1):
return [index for index, char in enumerate(s1)if char == ch and 5 >= index <= len(s1) - 6]
Each index in the list comp is the character index and each char is the actual character so we keep each index where char is equal to ch.
If you want the five chars that are both sides:
In [24]: s="CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP"
In [25]: inds = indexes("C",s)
In [26]: [s[i-5:i+6] for i in inds]
Out[26]: ['QDIQLCGMGIL', 'ILPEHCIIDIT']
I added checking the index as we obviously cannot get five chars before C if the index is < 5 and the same from the end.
You can do it all in a single function, yielding a slice when you find a match:
def find(ch, s):
ln = len(s)
for i, char in enumerate(s):
if ch == char and 5 <= i <= ln - 6:
yield s[i- 5:i + 6]
Where presuming the data in your question is actually two lines from yoru file like:
s="""">3fm8|A|A0JLQ2CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDALYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCY"""
Running:
for line in s.splitlines():
print(list(find("C" ,line)))
would output:
['0JLQ2CFLVNL', 'QDIQLCGMGIL', 'ILPEHCIIDIT']
['TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']
Which gives six matches not four as your expected output suggest so I presume you did not include all possible matches.
You can also speed up the code using str.find, starting at the last match index + 1 for each subsequent match
def find(ch, s):
ln, i = len(s) - 6, s.find(ch)
while 5 <= i <= ln:
yield s[i - 5:i + 6]
i = s.find(ch, i + 1)
Which will give the same output. Of course if the strings cannot overlap you can start looking for the next match much further in the string each time.
My solution is based on regex, and shows all possible solutions using regex and while loop. Thanks to #Smac89 for improving it by transforming it into a generator:
import re
string = """CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL"""
# Generator
def find_cysteine2(string):
# Create a loop that will utilize regex multiple times
# in order to capture matches within groups
while True:
# Find a match
data = re.search(r'(\w{5}C\w{5})',string)
# If match exists, let's collect the data
if data:
# Collect the string
yield data.group(1)
# Shrink the string to not include
# the previous result
location = data.start() + 1
string = string[location:]
# If there are no matches, stop the loop
else:
break
print [x for x in find_cysteine2(string)]
# ['QDIQLCGMGIL', 'ILPEHCIIDIT', 'TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']

How do you reverse the words in a string using python (manually)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Reverse the ordering of words in a string
I know there are methods that python already provides for this, but I'm trying to understand the basics of how those methods work when you only have the list data structure to work with. If I have a string hello world and I want to make a new string world hello, how would I think about this?
And then, if I can do it with a new list, how would I avoid making a new list and do it in place?
Split the string, make a reverse iterator then join the parts back.
' '.join(reversed(my_string.split()))
If you are concerned with multiple spaces, change split() to split(' ')
As requested, I'm posting an implementation of split (by GvR himself from the oldest downloadable version of CPython's source code: Link)
def split(s,whitespace=' \n\t'):
res = []
i, n = 0, len(s)
while i < n:
while i < n and s[i] in whitespace:
i = i+1
if i == n:
break
j = i
while j < n and s[j] not in whitespace:
j = j+1
res.append(s[i:j])
i = j
return res
I think now there are more pythonic ways of doing that (maybe groupby) and the original source had a bug (if i = n:, corrrected to ==)
Original Answer
from array import array
def reverse_array(letters, first=0, last=None):
"reverses the letters in an array in-place"
if last is None:
last = len(letters)
last -= 1
while first < last:
letters[first], letters[last] = letters[last], letters[first]
first += 1
last -= 1
def reverse_words(string):
"reverses the words in a string using an array"
words = array('c', string)
reverse_array(words, first=0, last=len(words))
first = last = 0
while first < len(words) and last < len(words):
if words[last] != ' ':
last += 1
continue
reverse_array(words, first, last)
last += 1
first = last
if first < last:
reverse_array(words, first, last=len(words))
return words.tostring()
Answer using list to match updated question
def reverse_list(letters, first=0, last=None):
"reverses the elements of a list in-place"
if last is None:
last = len(letters)
last -= 1
while first < last:
letters[first], letters[last] = letters[last], letters[first]
first += 1
last -= 1
def reverse_words(string):
"""reverses the words in a string using a list, with each character
as a list element"""
characters = list(string)
reverse_list(characters)
first = last = 0
while first < len(characters) and last < len(characters):
if characters[last] != ' ':
last += 1
continue
reverse_list(characters, first, last)
last += 1
first = last
if first < last:
reverse_list(characters, first, last=len(characters))
return ''.join(characters)
Besides renaming, the only change of interest is the last line.
You have a string:
str = "A long string to test this algorithm"
Split the string (at word boundary -- no arguments to split):
splitted = str.split()
Reverse the array obtained -- either using ranges or a function
reversed = splitted[::-1]
Concatenate all words with spaces in between -- also known as joining.
result = " ".join(reversed)
Now, you don't need so many temps, combining them into one line gives:
result = " ".join(str.split()[::-1])
str = "hello world"
" ".join(str.split()[::-1])

Categories