Optimal way to "stamp" string into desired string - python

So, I was looking for an algorithm for the following problem:
You are given a desired string s, and a stamp t. t is also a string. Let the beginning string be len(s)*"?".
Is it possible to use the stamp to transform the beginning string into the string s using the stamp? The whole stamp must fit inside the beginning string (the stamp's borders may not exceed the ?????... string's borders).
Print the number of stamps required and print the left border of the stamp for each stamping.
Example:
AABCACA (desired result)
ABCA (stamp)
Solution:
3
1 4 2
explanation: ??????? → ABCA??? → ABCABCA → AABCACA.
My solution:
If the stamp's first letter is not the desired string's first letter, the task is not possible. The same goes for the last letter. If the stamp doesn't have all the letters in the desired string, the task is impossible.
My algorithm goes like this: try to find the stamp in the desired string. If it is found, delete it and replace it with question marks. Mark down the left border of the stamp. Do this as long as you can.
Then look for the stamp's contiguous subarrays of size len(stamp)-1. If you find any of those, delete them and replace with question marks. Mark down the left border of the stamp.
Then look for the stamp's contiguous subarrays of size len(stamp)-2. If you find any of those, delete them and replace with question marks. Mark down the left border of the stamp. Do that until you are finished. There you have the answer.
The problems
I'm not sure what is wrong with my code as it can't seem to pass some test cases. There is probably a logical error.
import sys
desiredString = input()
stamp = input()
stampingSpots = []
if (len(set(desiredString)) != len(set(stamp)) or stamp[0] != desiredString[0] or stamp[-1] != desiredString[-1]):
print("-1")
sys.exit()
def searchAndReplace(stringToFind, fix): #Search for stringToFind and replace it with len(stringToFind)*"?". Fix is used to fix the position.
global desiredString
for x in range(0, len(desiredString)-len(stringToFind)+1):
if desiredString[x:x+len(stringToFind)] == stringToFind:
stampingSpots.append(x+1-fix) #Mark down the stamping spot
firstPart = desiredString[:x]
firstPart += len(stringToFind)*"?"
firstPart += desiredString[len(firstPart):]
desiredString = firstPart
return True
return False
while(searchAndReplace(stamp,0)): #Search for the full stamp in desiredString
searchAndReplace(stamp,0)
length = len(stamp)-1
while(length > 0):
for firstPart in range(0, len(stamp)-length+1):
secondPart = firstPart+length
while(searchAndReplace(stamp[firstPart:secondPart], firstPart)):
searchAndReplace(stamp[firstPart:secondPart], firstPart)
if len(stampingSpots) > 10*len(desiredString): #Too much output, not possible
print("-1")
sys.exit()
length -= 1
print(len(stampingSpots))
for i in reversed(stampingSpots):
print(i, end = " ")

The algorithm you describe is fundamentally flawed. The results it produces simply don't correspond to things the stamp can actually do. For example, with stamp AB and string AAA, it will try to stamp beyond the borders of the string to apply the final A. It will also try to use use the AB and BC substrings of the stamp ABC directly next to each other for the string ABBC, but no actual application of the stamp can do that.
The stamp cannot be used to apply arbitrary substrings of the stamp string. It can be used to stamp over previous stamp applications, but your algorithm doesn't consider the full complexity of overstamping. Also, even if you could stamp arbitrary substrings of the stamp string, you haven't proven your algorithm minimizes stamp applications.

We can use divide and conquer: let f(s) represent the minimum stamps required to generate string s where "*" is a wildcard. Then:
Geedily pick a part of the string that's the largest match for the stamp.
Set that part to wildcards and provide each of its right and left parts to f.
For example:
AABCACA (desired result)
ABCA (stamp)
f(AABCACA)
^^^^
ABCA (match)
= 1 + f(A****) + f(****CA)
=> f(A****)
^^^^
ABCA (match)
=> f(****CA)
^^^^
ABCA
Total 3

Related

Having problems shifting the characters of a string based on their ascii value

enter image description hereProblem : "For this problem you will be implementing a Caesarin Cipher. A Caesarin Cipher takes a string and a shift amount and shifts the characters in the string by the shift amount to create an enciphered string. If a character would be shifted past the end of the alphabet then it wraps back around to the beginning. For example, if the shift amount was 1 then a -> b, b->c, c->d, ... y->z, and z->a."
I do not know what is going on; why my code refuses to work.
you have more than one problem, see below some explanation and a way to solve the issue:
new_user_input will be resetting each time you will iterate, so only the last character will be changed
the roll over could be apply using replace, but the problem is, it will return the full string and you will have to know which indexes has changed. You can revert the problem by parsing your input and pick the correct index.
A way of solving the problem:
def character_shifter():
user_input = 'abz'
shift = 2
new_user_input = ''
alphabet = list(string.ascii_lowercase)
for c in user_input:
assert c in alphabet, f"invalid entry {c} not in {alphabet}"
reduce_chr = ord(c) - ord('a') # you have to reduce to align 'a' with zero, after that you can use your list (otherwise you will have indexerror exception)
shifted_chr = (reduce_chr + shift) # apply shift value
new_idx_alphabet = shifted_chr % len(alphabet) # the roll over can be handled using modulus operator (e.g. 27 % 26 = 1)
new_user_input += alphabet[new_idx_alphabet] # add char by char, the chr shifted
print(new_user_input)

Given some string and index, find longest repeated string

I apologize if this question has been answered elsewhere on this site, but I have searched for a while and have not found a similar question. For some slight context, I am working with RNA sequences.
Without diving into the Bio aspect, my question boils down to this:
Given a string and an index/position, I want to find the largest matching substring based on that position.
For example:
Input
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
position = 13 # the f in the substring 'stackoverflow'
Desired Output
rflo
So basically, despite 'stackov' being the longest repeated substring within the string, I only want the largest repeated substring based on the index given.
Any help is appreciated. Thanks!
Edit
I appreciate the answers provided thus far. However, I intentionally made position equal to 13 in order to show that I want to search and expand on either side of the starting position, not just to the right.
We iteratively check longer and longer substrings starting at position position simply checking if they occur in the remaining string using the in keyword. j is the length of the substring that we currently test, which is string[index:index+j] and longest keeps track of the longest substring seen so far. We can break as soon as the sequence starting at position does not occur anymore with the current length j
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
position = 13
index=position-1
longest=0
for j in range(1, (len(string)-index)//2):
if string[index:index+j] in string[index+j:]:
longest=j
else:
break
print(longest)
print(string[index:index+longest])
Output:
4
rflo
Use the in keyword to check for presence in the remainder of the string, like this:
string = "fsalstackoverflowwqiovmnrflofmnastackovsnv"
# Python string indices start at 0
position = 12
for sub_len in range(1, len(string) - position):
# Simply check if the string is present in the remainder of the string
sub_string = string[position:position + sub_len]
if sub_string in string[position + sub_len:] or sub_string in string[0:position]:
continue
break
# The last iteration of the loop did not return any occurrences, print the longest match
print(string[position:position + sub_len - 1])
Output:
rflo
If you set position = 32, this returns stackov, showing how it searches from the beginning as well.

Check if last character in series is upper case and convert to lowercase if true

I have a column of strings that look similar to the following:
1 IX-1-a
2 IX-1-b
3 IX-1-C
4 IX-1-D
Some end in lowercase letters while others end in uppercase. I need to standardize all endings to lowercase without affecting the letters at the beginning of the string. Below is some code fragment that I am working with to make changes within the series but it doesn't quite work.
if i in tw4515['Unnamed: 0'].str[-1].str.isupper() == True:
tw4515['Unnamed: 0'].str[-1].str.lower()
How can the truth table from tw4515['Unnamed: 0'].str[-1].str.isupper() be utilized efficiently to affect conditional changes?
One option is to split once from the right side, make the second part lowercase, then combine:
tmp = s.str.rsplit('-', 1)
out = tmp.str[0] + '-' + tmp.str[1].str.lower()
If the last part is always a single letter, #Barmar's solution is even better:
out = s.str[:-1] + s.str[-1].str.lower()
Output:
1 IX-1-a
2 IX-1-b
3 IX-1-c
4 IX-1-d

How would I implement a rfind in Lua with the arguments?

For example, I would like to do something like this in lua:
s = "Hey\n There And Yea\n"
print(s.rfind("\n", 0, 5))
I've tried making this in lua with the string.find function:
local s = "Hey\n There And Yea\n"
local _, p = s:find(".*\n", -5)
print(p)
But these aren't producing the same results. What am I doing wrong, and how can I fix this to making it the same as rfind?
Lua has a little known function string.reverse that reverses all characters of a string. While this is rarely needed, the function can typically be used to make a reverse search inside a string.
So to implement rfind, you want to search the reverse pattern inside the reverse original string, and finally make some arithmetics to obtain the offset from the original string.
Here is the code that mimics Python rfind:
function rfind(subject, tofind, startIdx, endIdx)
startIdx = startIdx or 0
endIdx = endIdx or #subject
subject = subject:sub(startIdx+1, endIdx):reverse()
tofind = tofind:reverse()
local idx = subject:find(tofind)
return idx and #subject - #tofind - idx + startIdx + 1 or -1
end
print(rfind("Hello World", "H")) --> 0
print(rfind("Hello World", "l")) --> 9
print(rfind("foo foo foo", "foo")) --> 8
print(rfind("Hello World", "Toto")) --> -1
print(rfind("Hello World", "l", 1, 4)) --> 3
Note that this version of rfind uses Python index convention, starting at 0 and returning -1 if string is not found. It would be more coherent in Lua to have 1-based index and to return nil when there are no match. The modification would be trivial.
The pattern I have written will only work for single-char substrings like the one the asker used as a test case. Skip ahead to the next bold header to see that answer, or read on for an explanation of some of the things they did wrong with their attempt. Skip to the very final bold header for a general, inefficient solution for multi-char substrings
I have tried to recreate the output of python mystring.rfind with lua mystring:find, it only works for single-character substrings. Later I will show you a function that does it for all cases but is a pretty bad loop.
As a recap (to address what you're doing wrong), let's talk about mystringvar:find("pattern", index), sugar for string.find(mystringvar, "pattern", index). This will return start, stop indexes.
The optional Index sets the start, not the end, but a negative index will count backwards from the 'right minus index' to end of string (an index of -1 will only evaluate the last character, -2 the last 2). This is not the desired behavior.
Instead of trying to use the index to create a substring, you should create a substring like this:
mystringvar:sub(start, end) will extract and return the substring from start to end (1 indexed, inclusive end). So to recreate Python's 0-5 (0 indexed, exclusive end), use 1-5.
Now note that these methods can be chained into string:sub(x, y):find("") but I will break it up for ease of reading. Without further ado, I present you:
The answer
local s = "Hey\n There And Yea\n"
local substr = s:sub(1,5)
local start, fin = substr:find("\n[^\n]-$")
print(start, ",", fin)
I had a few half measure solutions, but to make sure what I was writing would work for multiple substring instances (the 1-5 substring only contains 1), I tested with the substring and the whole string. Observe:
output with sub(1, 5): 4 , 5
output with sub(1, 19) (the whole length): 19 , 19
These both correctly report the beginning of the rightmost substring, but note that the "fin" index goes to the end of the sentence, I will explain in a second. I hope this is fine because rfind only returns the starting index anyway, so this should be an appropriate replacement.
Let's reread the code to see how it works:
sub I've already explained
There is no longer a need for index in string.find
Alright, what's this pattern "\n[^\n]-$"?
$ - anchor to end of sentence
[^x] - match "not x"
- - as few matches as possible (even 0) of the previous character or set (in this case, [^\n]). This means that if a string ends with your substring, it will still work)
It begins with \n, so all together it means: "Find me a line break, but followed by no other line breaks, up to the end of the sentence." This means that even though your substring only contains 1 instance of \n, if you were to use this function on a string with multiple substrings, you would still get the highest index, as rfind does.
Note that string.find does not conform to pattern groups (()), so it would be vain to wrap the \n in a group. As a consequence, I cannot stop end-anchoring $ from extending the fin variable to the end of the sentence.
I hope this works well for you.
Function to do this for substrings of any length
I will not be explaining this one.
function string.rfind(str, substr, plain) --plain is included for you to pass to find if you wish to ignore patterns
assert(substr ~= "") --An empty substring would cause an endless loop. Bad!
local plain = plain or false --default plain to false if not included
local index = 0
--[[
Watch closely... we continually shift the starting point after each found index until nothing is left.
At that point, we find the difference between the original string's length and the new string's length, to see how many characters we cut out.
]]--
while true do
local new_start, _ = string.find(str, substr, index, plain) --index will continually push up the string to after whenever the last index was.
if new_start == nil then --no match is found
if index == 0 then return nil end --if no match is found and the index was never changed, return nil (there was no match)
return #str - #str:sub(index) --if no match is found and we have some index, do math.
end
--print("new start", new_start)
index = new_start + 1 --ok, there was some kind of match. set our index to whatever that was, and add 1 so that we don't get stuck in a loop of rematching the start of our substring.
end
end
If you'd like to see my entire "test suite" for this...

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

Categories