Python: Removing certain characters from a word using while loop - python

Given that certain characters are 'abcdef' = char
I would like to 1) remove 3rd char of the 3 chars in a row in a word, and 2) from changed word, remove 2nd char of the 2 chars in a row in the word.
E.g. If there is a word 'bacifeaghab' it should
1) first remove 'c' and 'a', which is ba(c)ife(a)hab and change word into 'baifehab'
2) remove 'a','e', and 'b', which is b(a)if(e)ha(b) and change word into 'bifha'
This is what I have done so far, but when I run this and put word in it, it doesn't pint anything. Not even error or blank(' '), it just goes to next line without '>>>'.
def removal(w):
x,y = 0,0
while y < len(w)-2:
if (w[y] and w[y+1] and w[y+2]) in 'abcdef':
w = w[:y+2] + w[y+3:]
while x < len(w):
if w[x] in 'abcedf':
w = w[:x+1] + w[x+2:]
x = x+1
else :
x = x+1
return(w)
Could anyone find out what's wrong??
Since it was first time for me to use while loop, I thought that using double while loop can be a problem, so also tried,
def removal(w):
x,y,z = 0,0,0
while y < len(w)-2:
if (w[y] and w[y+1] and w[y+2]) in 'abcdef':
w = w[:y+2] + w[y+3:]
return(w)
But same result. I also tried print(w) at the end of function. same result.

You had a couple of mistakes:
There were 2 errors that I have corrected in your code. The first being that you weren't incrementing x properly (as you did not need the elif) and you had forgotten completely to increment y!
I corrected these and then also, in your if conditions, your syntax was incorrect. The part in the brackets evaluated to the last element and then just this was checked to see if it was in the string 'abcdef'. I have corrected that now to check each individual element in turn.
So now the function is:
def removal(w):
chars = 'abcdef'
x,y = 0,0
while y < len(w)-2:
if w[y] in chars and w[y+1] in chars and w[y+2] in chars:
w = w[:y+2] + w[y+3:]
y += 3
while x < len(w):
if w[x] in 'abcedf' and w[x+1] in 'abcdef':
w = w[:x+1] + w[x+2:]
x += 2
return w
and calling it with 'bacifeahab' (without the 'g' which I think was a typo in your e.g.):
removal("bacifeahab")
returns what you wanted:
'bifha'
Hope this helps!

Related

Not Parsing Through

I tried to parse through a text file, and see the index of the character where the four characters before it are each different. Like this:
wxrgh
The h would be the marker, since it is after the four different digits, and the index would be 4. I would find the index by converting the text into an array, and it works for the test but not for the actually input. Does anyone know what is wrong.
def Repeat(x):
size = len(x)
repeated = []
for i in range(_size):
k = i + 1
for j in range(k, _size):
if x[i] == x[j] and x[i] not in repeated:
repeated.append(x[i])
return repeated
with open("input4.txt") as f:
text = f.read()
test_array = []
split_array = list(text)
woah = ""
for i in split_array:
first = split_array[split_array.index(i)]
second = split_array[split_array.index(i) + 1]
third = split_array[split_array.index(i) + 2]
fourth = split_array[split_array.index(i) + 3]
test_array.append(first)
test_array.append(second)
test_array.append(third)
test_array.append(fourth)
print(test_array)
if Repeat(test_array) != []:
test_array = []
else:
woah = split_array.index(i)
print(woah)
print(woah)
I tried a test document and unit tests but that still does not work
You can utilise a set to help you with this.
Read the entire file into a list (buffer). Iterate over the buffer starting at offset 4. Create a set of the 4 characters that precede the current position. If the length of the set is 4 (i.e., they're all different) and the character at the current position is not in the set then you've found the index you're interested in.
W = 4
with open('input4.txt') as data:
buffer = data.read()
for i in range(W, len(buffer)):
if len(s := set(buffer[i-W:i])) == W and buffer[i] not in s:
print(i)
Note:
If the input data are split over multiple lines you may want to remove newline characters.
You will need to be using Python 3.8+ to take advantage of the assignment expression (walrus operator)

How to find the most amount of shared characters in two strings? (Python)

yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings
Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5
This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5
s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5

Replacing a character in a string with a set of two possible characters

a = ["0$%","0%%%","0$%$%","0$$"]
The above is a corrupted communication code where the first element of each sequence has been disguised as 0. I want to recover the original and correct code by computing a list of all possible sequences by replacing 0 with either $ or % and then checking which of the sequences is valid. Think of each sequence as corresponding to an alphabet if correct. For instance, "$$$" could correspond to the alphabet "B".
This is what I've done so far
raw_decoded = []
word = []
for i in a:
for j in i:
if j == "0":
x = list(itertools.product(["$", "%"], *i[1:]))
y = ("".join(i) for i in x)
for i in y:
raw_decoded.append(i)
for i in raw_decoded:
letter = code_dict[i] #access dictionary for converting to alphabet
word.append(letter)
return word
Try that:
output = []
for elem in a:
replaced_dollar = elem.replace('0', '$', 1)
replaced_percent = elem.replace('0', '%', 1)
# check replaced_dollar and replaced_percent
# and then write to output
output.append(replaced_...)
Not sure what you mean, perhaps you could add a desired output. What I got from your question could be solved in the following way:
b = []
for el in a:
if el[0] == '0':
b.append(el.replace('0', '%', 1))
b.append(el.replace('0', '$', 1))
else:
b.append(el)

extract substring pattern

I have long file like 1200 sequences
>3fm8|A|A0JLQ2
CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP
QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA
LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL
I want to read each possible pattern has cysteine in middle and has in the beginning five string and follow by other five string such as xxxxxCxxxxx
the output should be like this:
QDIQLCGMGIL
ILPEHCIIDIT
TISDNCVVIFS
FSKTSCSYCTM
this is the pogram only give position of C . it is not work like what I want
pos=[]
def find(ch,string1):
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos
z=find('C','AWERQRTCWERTYCTAAAACTTCTTT')
print z
You need to return outside the loop, you are returning on the first match so you only ever get a single character in your list:
def find(ch,string1):
pos = []
for i in range(len(string1)):
if ch == string1[i]:
pos.append(i)
return pos # outside
You can also use enumerate with a list comp in place of your range logic:
def indexes(ch, s1):
return [index for index, char in enumerate(s1)if char == ch and 5 >= index <= len(s1) - 6]
Each index in the list comp is the character index and each char is the actual character so we keep each index where char is equal to ch.
If you want the five chars that are both sides:
In [24]: s="CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTP QKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP"
In [25]: inds = indexes("C",s)
In [26]: [s[i-5:i+6] for i in inds]
Out[26]: ['QDIQLCGMGIL', 'ILPEHCIIDIT']
I added checking the index as we obviously cannot get five chars before C if the index is < 5 and the same from the end.
You can do it all in a single function, yielding a slice when you find a match:
def find(ch, s):
ln = len(s)
for i, char in enumerate(s):
if ch == char and 5 <= i <= ln - 6:
yield s[i- 5:i + 6]
Where presuming the data in your question is actually two lines from yoru file like:
s="""">3fm8|A|A0JLQ2CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
>2ht9|A|A0JLT0LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDALYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCY"""
Running:
for line in s.splitlines():
print(list(find("C" ,line)))
would output:
['0JLQ2CFLVNL', 'QDIQLCGMGIL', 'ILPEHCIIDIT']
['TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']
Which gives six matches not four as your expected output suggest so I presume you did not include all possible matches.
You can also speed up the code using str.find, starting at the last match index + 1 for each subsequent match
def find(ch, s):
ln, i = len(s) - 6, s.find(ch)
while 5 <= i <= ln:
yield s[i - 5:i + 6]
i = s.find(ch, i + 1)
Which will give the same output. Of course if the strings cannot overlap you can start looking for the next match much further in the string each time.
My solution is based on regex, and shows all possible solutions using regex and while loop. Thanks to #Smac89 for improving it by transforming it into a generator:
import re
string = """CFLVNLNADPALNELLVYYLKEHTLIGSANSQDIQLCGMGILPEHCIIDITSEGQVMLTPQKNTRTFVNGSSVSSPIQLHHGDRILWGNNHFFRLNLP
LATAPVNQIQETISDNCVVIFSKTSCSYCTMAKKLFHDMNVNYKVVELDLLEYGNQFQDA LYKMTGERTVPRIFVNGTFIGGATDTHRLHKEGKLLPLVHQCYL"""
# Generator
def find_cysteine2(string):
# Create a loop that will utilize regex multiple times
# in order to capture matches within groups
while True:
# Find a match
data = re.search(r'(\w{5}C\w{5})',string)
# If match exists, let's collect the data
if data:
# Collect the string
yield data.group(1)
# Shrink the string to not include
# the previous result
location = data.start() + 1
string = string[location:]
# If there are no matches, stop the loop
else:
break
print [x for x in find_cysteine2(string)]
# ['QDIQLCGMGIL', 'ILPEHCIIDIT', 'TISDNCVVIFS', 'FSKTSCSYCTM', 'TSCSYCTMAKK']

The Alphabet and Recursion

I'm almost done with my program, but I've made a subtle mistake. My program is supposed to take a word, and by changing one letter at a time, is eventually supposed to reach a target word, in the specified number of steps. I had been trying at first to look for similarities, for example: if the word was find, and the target word lose, here's how my program would output in 4 steps:
['find','fine','line','lone','lose]
Which is actually the output I wanted. But if you consider a tougher set of words, like Java and work, the output is supposed to be in 6 steps.
['java', 'lava', 'lave', 'wave', 'wove', 'wore', 'work']
So my mistake is that I didn't realize you could get to the target word, by using letters that don't exist in the target word or original word.
Here's my Original Code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
if lookup
if lookup(z[0]+x[1:]) is True and z[0]+x[1:]!=x :##check every letter that could be from z, in variations of, and check if they're in the dictionary.
word=z[0]+x[1:]
while i!=len(x):
if lookup(x[:i-1]+z[i-1]+x[i:]) and x[:i-1]+z[i-1]+x[i:]!=x:
word=x[:i-1]+z[i-1]+x[i:]
i+=1
if lookup(x[:len(x)-1]+z[len(word)-1]) and x[:len(x)-1]+z[len(x)-1]!=x :##same applies here.
word=x[:len(x)-1]+z[len(word)-1]
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Here's my current code:
import string
def changeling(word,target,steps):
alpha=string.ascii_lowercase
x=word##word and target has been changed to keep the coding readable.
z=target
if steps==0 and word!= target:##if the target can't be reached, return nothing.
return []
if x==z:##if target has been reached.
return [z]
holderlist=[]
if len(word)!=len(target):##if the word and target word aren't the same length print error.
print "error"
return None
i=1
for items in alpha:
i=1
while i!=len(x):
if lookup(x[:i-1]+items+x[i:]) is True and x[:i-1]+items+x[i:]!=x:
word =x[:i-1]+items+x[i:]
holderlist.append(word)
i+=1
if lookup(x[:len(x)-1]+items) is True and x[:len(x)-1]+items!=x:
word=x[:len(x)-1]+items
holderlist.append(word)
y = changeling(word,target,steps-1)
if y :
return [x] + y##used to concatenate the first word to the final list, and if the/
list goes past the amount of steps.
else:
return None
The differences between the two is that the first checks every variation of find with the letters from lose. Meaning: lind, fond, fisd, and fine. Then, if it finds a working word with the lookup function, it calls changeling on that newfound word.
As opposed to my new program, which checks every variation of find with every single letter in the alphabet.
I can't seem to get this code to work. I've tested it by simply printing what the results are of find:
for items in alpha:
i=1
while i!=len(x):
print (x[:i-1]+items+x[i:])
i+=1
print (x[:len(x)-1]+items)
This gives:
aind
fand
fiad
fina
bind
fbnd
fibd
finb
cind
fcnd
ficd
finc
dind
fdnd
fidd
find
eind
fend
fied
fine
find
ffnd
fifd
finf
gind
fgnd
figd
fing
hind
fhnd
fihd
finh
iind
find
fiid
fini
jind
fjnd
fijd
finj
kind
fknd
fikd
fink
lind
flnd
fild
finl
mind
fmnd
fimd
finm
nind
fnnd
find
finn
oind
fond
fiod
fino
pind
fpnd
fipd
finp
qind
fqnd
fiqd
finq
rind
frnd
fird
finr
sind
fsnd
fisd
fins
tind
ftnd
fitd
fint
uind
fund
fiud
finu
vind
fvnd
fivd
finv
wind
fwnd
fiwd
finw
xind
fxnd
fixd
finx
yind
fynd
fiyd
finy
zind
fznd
fizd
finz
Which is perfect! Notice that each letter in the alphabet goes through my word at least once. Now, what my program does is use a helper function to determine if that word is in a dictionary that I've been given.
Consider this, instead of like my first program, I now receive multiple words that are legal, except when I do word=foundword it means I'm replacing the previous word each time. Which is why I'm trying holderlist.append(word).
I think my problem is that I need changeling to run through each word in holderlist, and I'm not sure how to do that. Although that's only speculation.
Any help would be appreciated,
Cheers.
I might be slightly confused about what you need, but by borrowing from this post I belive I have some code that should be helpful.
>>> alphabet = 'abcdefghijklmnopqrstuvwxyz'
>>> word = 'java'
>>> splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
>>> splits
[('', 'java'), ('j', 'ava'), ('ja', 'va'), ('jav', 'a'), ('java', '')]
>>> replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
>>> replaces
['aava', 'bava', 'cava', 'dava', 'eava', 'fava', 'gava', 'hava', 'iava', 'java', 'kava', 'lava', 'mava', 'nava', 'oava', 'pava', 'qava', 'rava', 'sava', 'tava', 'uava', 'vava', 'wav
a', 'xava', 'yava', 'zava', 'java', 'jbva', 'jcva', 'jdva', 'jeva', 'jfva', 'jgva', 'jhva', 'jiva', 'jjva', 'jkva', 'jlva', 'jmva', 'jnva', 'jova', 'jpva', 'jqva', 'jrva', 'jsva', '
jtva', 'juva', 'jvva', 'jwva', 'jxva', 'jyva', 'jzva', 'jaaa', 'jaba', 'jaca', 'jada', 'jaea', 'jafa', 'jaga', 'jaha', 'jaia', 'jaja', 'jaka', 'jala', 'jama', 'jana', 'jaoa', 'japa'
, 'jaqa', 'jara', 'jasa', 'jata', 'jaua', 'java', 'jawa', 'jaxa', 'jaya', 'jaza', 'java', 'javb', 'javc', 'javd', 'jave', 'javf', 'javg', 'javh', 'javi', 'javj', 'javk', 'javl', 'ja
vm', 'javn', 'javo', 'javp', 'javq', 'javr', 'javs', 'javt', 'javu', 'javv', 'javw', 'javx', 'javy', 'javz']
Once you have a list of all possible replaces, you can simply do
valid_words = [valid for valid in replaces if lookup(valid)]
Which should give you all words that can be formed by replacing 1 character in word. By placing this code in a separate method, you could take a word, obtain possible next words from that current word, and recurse over each of those words. For example:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def next_word(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
return [valid for valid in replaces if lookup(valid)]
Is this enough help? I think your code could really benefit by separating tasks into smaller chunks.
Fixed your code:
import string
def changeling(word, target, steps):
alpha=string.ascii_lowercase
x = word #word and target has been changed to keep the coding readable.
z = target
if steps == 0 and word != target: #if the target can't be reached, return nothing.
return []
if x == z: #if target has been reached.
return [z]
holderlist = []
if len(word) != len(target): #if the word and target word aren't the same length print error.
raise BaseException("Starting word and target word not the same length: %d and %d" % (len(word),
i = 1
for items in alpha:
i=1
while i != len(x):
if lookup(x[:i-1] + items + x[i:]) is True and x[:i-1] + items + x[i:] != x:
word = x[:i-1] + items + x[i:]
holderlist.append(word)
i += 1
if lookup(x[:len(x)-1] + items) is True and x[:len(x)-1] + items != x:
word = x[:len(x)-1] + items
holderlist.append(word)
y = [changeling(pos_word, target, steps-1) for pos_word in holderlist]
if y:
return [x] + y #used to concatenate the first word to the final list, and if the list goes past the amount of steps.
else:
return None
Where len(word) and len(target), it'd be better to raise an exception than print something obscure, w/o a stack trace and non-fatal.
Oh and backslashes(\), not forward slashes(/), are used to continue lines. And they don't work on comments

Categories