Splice a string, multiple times, using a keyword - python

I am trying to break a string apart by removing segments that occur between two words.
Example:
AGCGUGUGAGAGCUCCGA
I will remove the parts that occur between: GUGU and AGAG
So, the new string will be:
AGCCUCCGA
I wrote a code that utilises while loop to keep 'splicing' a string over and over till it can't find the GUGU and AGAG in the string. The process works, most of the time.
I encountered one case where the 'input' is extremely long and then my code is stuck in an infinite loop and I don't understand why that is the case.
I was hoping that someone could review it and help me improve on what I am doing.
def splice(strand):
while True:
initial = strand.find('GUGU')
final = strand.find('AGAG')
if initial == -1:
break
if final == -1:
break
strand = strand[:initial] + strand[final+4:]
return strand
if __name__ == "__main__":
strand = input("Input strand: ")
print()
spliced = splice(strand)
print("Output is {}".format(spliced))
The case where it is failing is:
GUGUAGAGGUCACAGUGUAAAAGCUCUAGAGCAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG
Expected result:
GUCACACAGACAGAUGUAGAGCAAAGGCAACA
I haven't encountered any other cases where the code will not work.

Your code doesn't work if AGAG is right before GUGU. After the first iteration on that input, the value of strand is
GUCACACAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG
Then initial is 21 and final is 17, so you do:
strand = strand[:21] + strand[21:]
which just sets strand back to the same value, so you get stuck in a loop.
The string.find() method has an optional start argument, so you can tell it to start looking for AGAG after initial:
final = strand.find("AGAG", initial+4)
You can also do the whole thing with a regexp substitution:
import re
strand = re.sub(r'GUGU(.*?)AGAG', '', strand)

import re
pattern = '(.*?)GUGU.*?AGAG'
s1 = 'AGCGUGUGAGAGCUCCGA'
s2 = 'GUGUAGAGGUCACAGUGUAAAAGCUCUAGAGCAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG'
print ''.join(re.findall(pattern,s1)) + s1[s1.rfind('AGAG')+4:]
print ''.join(re.findall(pattern,s2)) + s2[s2.rfind('AGAG')+4:]
AGCCUCCGA
GUCACACAGACAGAUGUAGAGCAAAGGCAACA

Related

How do I read a set of characters from a specific position in a string in python

So I am writing my own cypher and decoder for a bit of fun but I've gotten stuck on my decipher program where I need to move up by 1 character in my string in order to decipher the next "block" of characters.
cyinput2 = ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff:"
#989E86
Above is my string that I am trying move up in, essentially I am trying to move on from my first base 6 string to the second, which in this case would be ",#808000" and then so on by increments of 10 characters from each comma.
I've tried to incorporate some ideas such as just adding 1 character to count as the space in between each base 6 string but I wasn't able to figure out. I have also started to work on using a variable set to integer 0 that would increment by 10 after each string is converted back into hexadecimal.
But I haven't yielded any results from that yet, so I'm hoping someone on here may be able to stop me from going overboard when I could instead use a much simpler solution.
while str(cyinput2) != "":
S = 0
S1 = 9
hexstart = ','
hexend = ':'
start_index = cyinput2.find(hexstart) + len(hexstart)
end_index = cyinput2.find(hexend)
block = cyinput2[start_index:end_index]
print(block)
You could simply convert your string to a list by using split,
If you want clean blocks you can use replace too
# Removing useless chars in order to have a clean split
cyinput2 = cyinput2.replace(',', '')
cyinput2 = cyinput2.replace(':', '')
# From : ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff:"
# to : "#96c8a2 #808000 #96c8a2 #e1a95f #808000 #6f00ff"
# Split
cyinput2 = cyinput2.split(' ')
# From : "#96c8a2 #808000 #96c8a2 #e1a95f #808000 #6f00ff"
# to : ["#96c8a2", "#808000", "#96c8a2", "#e1a95f", "#808000", "#6f00ff"]
for block in cyinput2:
print(block)
You can use this while loop
cyinput2 = ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff: "
while len(str(cyinput2)) > 9 :
print(cyinput2[1:10])
cyinput2 = cyinput2.replace(cyinput2[0:10],'')
Output
#96c8a2:
#808000:
#e1a95f:
#6f00ff:

Trying to make a program that simulates typing

I'm trying to making a program where each keypress prints the next character in a predetermined string, so it's like the user is typing text.
Here's the code I'm trying to use:
def typing(x):
letter = 0
for i in range(0, len(x)):
getch.getch()
print(x[letter], end = "")
letter += 1
typing("String")
What happens here is you need to press 6 keys (The length of the string) and then it prints all at once. I can sort of fix this by removing the , end = "", which makes the letters appear one at a time, but then the outcome looks like this:
S
t
r
i
n
g
Any ideas for making the letters appear one at a time and stay on the same line?
You can try this code which works for me:
import time
def typewrite(word: str):
for i in word:
time.sleep(0.1)
print(i, end="", flush = True)
typewrite("Hello World")

join user input at index of inputs

I can't seem to figure out why I can't get this loop to loop - it always breaks instead. I believe if it was looping, the script (hopefully) would be working as instructed.
I've attached the instructions to the script and inline to explain my thinking.
Tahnks!
script accepts user inputs, and every time script receives a string, it should add the string to a growing string. Newly added strings should be added to the growing string at the index equal to the newly added string's length. If the newly added string's length is equal to or larger than the growing string, this script should add the new string to the end of the growing string. When this script receives a blank input, this application should stop receiving input and print the growing string to the console.
if __name__ == "__main__":
user_word = input()
second_word = input()
results = user_word + second_word[:]
i = results
while results == "": # When script receives a blank input
print(results) # stop receiving input and print the growing string
break
if user_word >= results: # if newly added string length equal to or larger
results = user_word + second_word[:]
user_word.join(results) # the new string added to end of the growing string.
print(results)
if user_word < results: # new string is shorter than the existing string THEN
results = user_word + second_word[:] # add the new string at the index equal to the new string length.
user_word.join(results) # Newly added strings should be added to the growing string
print(results)
s = ''
while True:
user_word = input('Enter string')
if len(user_word) >= len(s):
s = s + user_word
elif user_word == '':
print(s)
break
else:
s = s[:len(user_word)] + user_word + s[len(user_word):]

Returning every instance of whatever's between two strings in a file [Python 3]

What I'm trying to do is open a file, then find every instance of '[\x06I"' and '\x06;', then return whatever is between the two.
Since this is not a standard text file (it's map data from RPG maker) readline() will not work for my purposes, as the file is not at all formatted in such a way that the data I want is always neatly within one line by itself.
What I'm doing right now is loading the file into a list with read(), then simply deleting characters from the very beginning until I hit the string '[\x06I'. Then I scan ahead to find '\x06;', store what's between them as a string, append said string to a list, then resume at the character after the semicolon I found.
It works, and I ended up with pretty much exactly what I wanted, but I feel like that's the worst possible way to go about it. Is there a more efficient way?
My relevant code:
while eofget == 0:
savor = 0
while savor == 0 or eofget == 0:
if line[0:4] == '[\x06I"':
x = 4
spork = 0
while spork == 0:
x += 1
if line[x] == '\x06':
if line[x+1] == ';':
spork = x
savor = line[5:spork] + "\n"
line = line[x+1:]
linefinal[lineinc] = savor
lineinc += 1
elif line[x:x+7] == '#widthi':
print("eof reached")
spork = 1
eofget = 1
savor = 0
elif line[x:x+7] == '#widthi':
print("finished map " + mapname)
eofget = 1
savor = 0
break
else:
line = line[1:]
You can just ignore the variable names. I just name things the first thing that comes to mind when I'm doing one-offs like this. And yes, I am aware a few things in there don't make any sense, but I'm saving cleanup for when I finalize the code.
When eofget gets flipped on this subroutine terminates and the next map is loaded. Then it repeats. The '#widthi' check is basically there to save time, since it's present in every map and indicates the beginning of the map data, AKA data I don't care about.
I feel this is a natural case to use regular expressions. Using the findall method:
>>> s = 'testing[\x06I"text in between 1\x06;filler text[\x06I"text in between 2\x06;more filler[\x06I"text in between \n with some line breaks \n included in the text\x06;ending'
>>> import re
>>> p = re.compile('\[\x06I"(.+?)\x06;', re.DOTALL)
>>> print(p.findall(s))
['text in between 1', 'text in between 2', 'text in between \n with some line breaks \n included in the text']
The regex string '\[\x06I"(.+?)\x06;'can be interpreted as follows:
Match as little as possible (denoted by ?) of an undetermined number of unspecified characters (denoted by .+) surrounded by '[\x06I"' and '\x06;', and only return the enclosed text (denoted by the parentheses around .+?)
Adding re.DOTALL in the compile makes the .? match line breaks as well, allowing multi-line text to be captured.
I would use split():
fulltext = 'adsfasgaseg[\x06I"thisiswhatyouneed\x06;sdfaesgaegegaadsf[\x06I"this is the second what you need \x06;asdfeagaeef'
parts = fulltext.split('[\x06I"') # split by first label
results = []
for part in parts:
if '\x06;' in part: # if second label exists in part
results.append(part.split('\x06;')[0]) # get the part until the second label
print results

Iteration issue in python

[Code below question]
The idea is to expand on python's built-in split() function. This function takes two strings, one that needs to be split, and the second is what characters to omit and split at in the first string. This code has worked, but for some reason with this input, it will not iterate anything past the last comma. In other words, no matter the input in this format, it won't append anything past the final comma. I can't figure out why. I have gone line through line of this code and I can't find out where I am losing it.
Why is my code not iterating through any characters past the last comma?
def split_string(source,splitlist):
## Variables ##
output = []
start, start_pos , tracker = 0 , 0 , 0
## Iterations ##
for char in source:
start = source.find(char,start)
if char in splitlist:
tracker += 1
if tracker <= 1:
end_pos = source.find(char, start)
output.append(source[start_pos:end_pos])
start_pos = end_pos + 1
else:
start_pos+=1
else:
tracker = 0
return output
out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
Because your code does not have any code to append from the last comma till the end of string.
end_pos = source.find(char, start)
output.append(source[start_pos:end_pos])
Your need to finally append a range between last comma and string length.
Add the following after the loop ends.
output.append(source[end_pos+1:];
Modified code:
http://ideone.com/9Khu4g

Categories