I apologize if this question has been answered elsewhere on this site, but I have searched for a while and have not found a similar question. For some slight context, I am working with RNA sequences.
Without diving into the Bio aspect, my question boils down to this:
Given a string and an index/position, I want to find the largest matching substring based on that position.
For example:
Input
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
position = 13 # the f in the substring 'stackoverflow'
Desired Output
rflo
So basically, despite 'stackov' being the longest repeated substring within the string, I only want the largest repeated substring based on the index given.
Any help is appreciated. Thanks!
Edit
I appreciate the answers provided thus far. However, I intentionally made position equal to 13 in order to show that I want to search and expand on either side of the starting position, not just to the right.
We iteratively check longer and longer substrings starting at position position simply checking if they occur in the remaining string using the in keyword. j is the length of the substring that we currently test, which is string[index:index+j] and longest keeps track of the longest substring seen so far. We can break as soon as the sequence starting at position does not occur anymore with the current length j
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
position = 13
index=position-1
longest=0
for j in range(1, (len(string)-index)//2):
if string[index:index+j] in string[index+j:]:
longest=j
else:
break
print(longest)
print(string[index:index+longest])
Output:
4
rflo
Use the in keyword to check for presence in the remainder of the string, like this:
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
# Python string indices start at 0
position = 12
for sub_len in range(1, len(string) - position):
# Simply check if the string is present in the remainder of the string
sub_string = string[position:position + sub_len]
if sub_string in string[position + sub_len:] or sub_string in string[0:position]:
continue
break
# The last iteration of the loop did not return any occurrences, print the longest match
print(string[position:position + sub_len - 1])
Output:
rflo
If you set position = 32, this returns stackov, showing how it searches from the beginning as well.
Related
This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 2 years ago.
I am trying to create a program in which the user inputs a statement containing two '!' surrounding a string. (example: hello all! this is a test! bye.) I am to grab the string within the two exclamation points, and print it in reverse letter by letter. I have been able to find the start and endpoints that contain the statement, however I am having difficulties creating an index that would cycle through my variable userstring in reverse and print.
test = input('Enter a string with two "!" surrounding portion of the string:')
expoint = test.find('!')
#print (expoint)
twoexpoint = test.find('!', expoint+1)
#print (twoexpoint)
userstring = test[expoint+1 : twoexpoint]
#print(userstring)
number = 0
while number < len(userstring) :
letter = [twoexpoint - 1]
print (letter)
number += 1
twoexpoint - 1 is the last index of the string you need relative to the input string. So what you need is to start from that index and reduce. In your while loop:
letter = test[twoexpoint- number - 1]
Each iteration you increase number which will reduce the index and reverse the string.
But this way you don't actually use the userstring you already found (except for the length...). Instead of caring for indexes, just reverse the userstring:
for letter in userstring[::-1]:
print(letter)
Explanation we use regex to find the pattern
then we loop for every occurance and we replace the occurance with the reversed string. We can reverse string in python with mystring[::-1] (works for lists too)
Python re documentation Very usefull and you will need it all the time down the coder road :). happy coding!
Very usefull article Check it out!
import re # I recommend using regex
def reverse_string(a):
matches = re.findall(r'\!(.*?)\!', a)
for match in matches:
print("Match found", match)
print("Match reversed", match[::-1])
for i in match[::-1]:
print(i)
In [3]: reverse_string('test test !test! !123asd!')
Match found test
Match reversed tset
t
s
e
t
Match found 123asd
Match reversed dsa321
d
s
a
3
2
1
You're overcomplicating it. Don't bother with an index, simply use reversed() on userstring to cycle through the characters themselves:
userstring = test[expoint+1:twoexpoint]
for letter in reversed(userstring):
print(letter)
Or use a reversed slice:
userstring = test[twoexpoint-1:expoint:-1]
for letter in userstring:
print(letter)
This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.
I have a regex python script to go over Hex data and find patterns which looks like this
r"(.{6,}?)\1{2,}"
all it does is look for at least 6 character long hex strings that repeat and at least have two instances of it repeating. My issue is it is also finding substrings inside larger strings it has already found for example:
if it was "a00b00a00b00a00b00a00b00a00b00a00b00" it would find 2 instances of "a00b00a00b00a00b00" and 6 instances of "a00b00" How could I go about keeping only the longest patterns found and ignoring even looking for shorter patterns without more hardcoded parameters?
#!/usr/bin/python
import fnmatch
pattern_string = "abcdefabcdef"
def print_pattern(pattern, num):
n = num
# takes n and splits it by that value in this case 6
new_pat = [pattern[i:i+n] for i in range(0, len(pattern), n)]
# this is the hit counter for matches
match = 0
# stores the new value of the match
new_match = ""
#loops through the list to see if it matches more than once
for new in new_pat:
new_match = new
print new
#if matches previous keep adding to match
if fnmatch.fnmatch(new, new_pat[0]):
match += 1
if match:
print "Count: %d\nPattern:%s" %(match, new_match)
#returns the match
return new_match
print_pattern(pattern_string, 6)
regex is better but this was funner to write
Hers is my code challenge and my code. I'm stuck, not sure why its not working properly
-write a function named plaintext that takes a single parameter of a string encoded in this format: before each character of the message, add a digit and a series of other characters. the digit should correspond to the number of characters that will precede the message's actual, meaningful character. it should return the decoded word in string form
""" my pseudocode:
#convert string to a list
#enumerate list
#parse string where the element and the index plus one returns the desired index
#return decoded message of desired indexes """
encoded_message = "0h2ake1zy"
#encoded_message ="2xwz"
#encoded_message = "0u2zyi2467"
def plaintext(string):
while(True):
#encoded_message = raw_input("enter encoded message:")
for index, character in enumerate(list(encoded_message)):
character = int(character)
decoded_msg = index + character + 1
print decoded_msg
You need to go iterate over the string's characters, and in each iteration skip the specified number of characters and take the following one:
def plaintext(s):
res = ''
i = 0
while i < len(s):
# Skip the number of chars specified
i += int(s[i])
# Take the letter after them
i += 1
res += s[i]
# Move on to the next position
i += 1
return res
Here are some hints.
First decide what looping construct you want to use. Python offers choices: iterate over individual characters, loop over the indices of the characters, while loop. You certainly don't want both a while and a for loop.
You're going to be processing the string in groups, "0h", then "2ake", then "1zy" to take your first example string. What is the condition that will cause you to exit the loop?
Now, look at your line decoded_msg = index + character + 1. To construct the decoded string, you want to index into the string itself, based on the digit's value. So, this line should contain something like, encoded_message[x] for some x, that you have to figure out using the digit.
Also, you'll want to accumulate characters as you go along. So you'll need to begin the loop with an empty result string decoded_msg="" and add a character to it decoded_msg += ... for each iteration of the loop.
I hope this helps a little more than just giving the answer.
I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).
There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.
I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)
To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))
I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])
import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2
While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()