The longest prefix that is also suffix of two lists

The longest prefix that is also suffix of two lists - python

So I have two lists:
def function(w,w2): # => this is how I want to define my function (no more inputs than this 2 lists)
I want to know the biggest prefix of w which is also suffix of w2.
How can I do this only with logic (without importing anything)

I can try and help get you started on this problem, but it sort of sounds like a homework question so I won't give you a complete answer (per these guidelines).
If I were you I'd start with a small case and build up from there. Lets start with:
w = "ab"
w2 = "ba"
The function for this might look like:
def function(w,w2):
prefix = ""
# Does the first letter of w equal the last letter of w2?
if w[0] == w2[-1]:
prefix += w[0]
# What about the second letter?
if w[1] == w2[-2]:
prefix += w[1]
return prefix
Then when you run print(function(w,w2)) you get ab.
This code should work for 2 letter words, but what if the words are longer? This is when we would introduce a loop.
def function(w,w2):
prefix = ""
for i in range(0, len(w)):
if w[i] == w2[(i+1)*-1]:
prefix+= w[i]
else:
return prefix
return prefix
Hopefully this code will offer a good starting place for you! One issue with what I have written is what if w2 is shorter than w. Then you will get an index error! There are a few ways to solve this, but one way is to make sure that w is always the shorter word. Best of luck, and feel free to DM me if you have other questions.

A simple iterative approach could be:
Start from the longest possible prefix (i.e. all of w), and test it against a w2 suffix of the same length.
If they match, you can return it immediately, since it must be the longest possible match.
If they don't match, shorten it by one, and repeat.
If you never find a match, the answer is an empty string.
In code, this looks like:
>>> def function(w, w2):
... for i in range(len(w), 0, -1):
... if w[:i] == w2[-i:]:
... return w[:i]
... return ''
...
>>> function("asdfasdf", "qwertyasdf")
'asdf'
The slice operator (w[:i] for a prefix of length i, w2[-i:] for a suffix of length i) gracefully handles mismatched lengths by just giving you a shorter string if i is out of the range of the given string (which means they won't match, so the iteration is forced to continue until the lengths do match).
>>> function("aaaaaba", "ba")
'a'
>>> function("a", "abbbaababaa")
'a'

Related

find pattern in a string without using regex

I'm trying to find a pattern in a string. Example:
trail = 'AABACCCACCACCACCACCACC" one can note the "ACC" repetition after a prefix of AAB; so the result should be AAB(ACC)
Without using regex 'import re' how can I do this. What I did so far:
def get_pattern(trail):
for j in range(0,len(trail)):
k = j+1
while k<len(trail) and trail[j]!=trail[k]:
k+=1
if k==len(trail)-1:
continue
window = ''
stop = trail[j]
m = j
while m<len(trail) and k<len(trail) and trail[m]==trail[k]:
window+=trail[m]
m+=1
k+=1
if trail[m]==stop and len(window)>1:
break
if len(window)>1:
prefix=''
if j>0:
prefix = trail[0:j]
return prefix+'('+window+')'
return False
This will do (almost) the trick because in a use case like this:
"AAAAAAAAAAAAAAAAAABDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBD"
the result is AA but it should be: AAAAAAAAAAAAAAAAAA(BD)

The issue with your code is that once you find a repetition that is of length 2 or greater, you don't check forward to make sure it's maintained. In your second example, this causes it to grab onto the 'AA' without seeing the 'BD's that follow.
Since we know we're dealing with cases of prefix + window, it makes sense to instead look from the end rather than the beginning.
def get_pattern(string):
str_len = len(string)
splits = [[string[i-rep_length: i] for i in range(str_len, 0, -rep_length)] for rep_length in range(1, str_len//2)]
reps = [[window == split[0] for window in split].index(False) for split in splits]
prefix_lengths = [str_len - (i+1)*rep for i,rep in enumerate(reps)]
shortest_prefix_length = min(prefix_lengths)
indices = [i for i, pre_len in enumerate(prefix_lengths) if pre_len == shortest_prefix_length]
reps = list(map(reps.__getitem__, indices))
splits = list(map(splits.__getitem__, indices))
max_reps = max(reps)
window = splits[reps.index(max_reps)][0]
prefix = string[0:shortest_prefix_length]
return f'{prefix}({window})' if max_reps > 1 else None
splits uses list comprehension to create a list of lists where each sublist splits the string into rep_length sized pieces starting from the end.
For each sublist split, the first split[0] is our proposed pattern and we see how many times that it's repeated. This is easily done by finding the first instance of False when checking window == split[0] using the list.index() function. We also want to calculate the size of the prefix. We want the shortest prefix with the largest number of reps. This is because of nasty edge cases like jeifjeiAABBBBBBBBBBBBBBAABBBBBBBBBBBBBBAABBBBBBBBBBBBBBAABBBBBBBBBBBBBB where the window has B that repeats more than the window itself. Additionally, anything that repeats 4 times can also be seen as a double-sized window repeated twice.
If you want to deal with an additional suffix, we can do a hacky solution by just trimming from the end until get_pattern() returns a pattern and then just append what was trimmed:
def get_pattern_w_suffix(string):
for i in range(len(string), 0, -1):
pattern = get_pattern(string[0:i])
suffix = string[i:]
if pattern is not None:
return pattern + suffix
return None
However, this assumes that the suffix doesn't have a pattern itself.

Recursively searching for a string in a list of characters

I have a problem to solve which is to recursively search for a string in a list (length of string and list is atleast 2) and return it's positions. for example: if we had ab with the list ['a','b','c'], the function should return '(0,2)', as ab starts at index 0 and ends at 1 (we add one more).
if we had bc with the same list the function should return '(1,3)'.
if we had ac with the same list the function should return not found.
Note that I'm solving a bigger problem which is to recursively search for a string in a matrix of characters (that appears from up to down, or left to right only), but I am nowhere near the solution, so I'm starting by searching for a word in a row of a matrix on a given index (as for searching for a word in a normal list), so my code might have char_mat[idx], treat it as a normal list like ['c','d','e'] for example.
Note that my code is full of bugs and it doesn't work, so I explained what I tried to do under it.
def search_at_idx(search_word, char_mat, idx, start, end):
if len(char_mat[idx]) == 2:
if ''.join(char_mat[idx]) == search_word:
return 0,2
else:
return 'not found', 'not found'
start, end = search_at_idx(search_word, char_mat[idx][1:], idx, start+1, end)
return start, end
The idea of what I tried to do here is to find the base of the recursion (when the length of the list reaches 2), and with that little problem I just check if my word is equal to the chars when joined together as a string, and return the position of the string if it's equal else return not found
Then for the recursion step, I send the list without the first character, and my start index +1, so if this function does all the job for me (as the recursion hypothesis), I need to check the last element in the list so my recursion works. (but I don't know really if this is the way to do it since the last index can be not in the word, so I got stuck). Now I know that I made alot of mistakes and I'm nowhere near the correct answer,I would really appreciate any explanation or help in order to understand how to do this problem and move on to my bigger problem which is finding the string in a matrix of chars.

I've thrown together a little example that should get you a few steps ahead
char_mat = [['c', 'e', 'l', 'k', 'v'],]
search_word = 'lk'
def search_at_idx(search_word, char_mat, idx, start=0):
if len(char_mat[idx]) < len(search_word):
return 'not', 'found'
if ''.join(char_mat[idx][:len(search_word)]) == search_word:
return start, start+len(search_word)
char_mat[idx] = char_mat[idx][1:]
start, end = search_at_idx(search_word, char_mat, idx, start+1)
return start, end
print(search_at_idx(search_word, char_mat, 0))
To point out a few errors of yours:
In your recursion, you use char_mat[idx][1:]. This will pass a slice of the list and not the modified matrix. That means your next call to char_mat[idx] will check the letter at that index in the array. I'll recommend using the debugger and stepping through the program to check the contents of your variables
Instead of using start and end, you can always assume that the found word has the same length as the word you are searching for. So the distance you have to look is always start + len(search_word)
If you have any additional questions about my code, please comment.
Here's an example for list comprehension if that counts as loophole:
foundword = list(map("".join, list(zip(*([char_mat[idx][i:] + list(char_mat[idx][i-1]) for i in range(len(search_word))])))[:-1])).index(search_word)
print((foundword, foundword + len(search_word)) if foundword else 'Not found')

l = ["a","b","c"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:2]:
return indx_val, indx_val+1
else:
if len(look_list) == 2:
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("bc",l,0))
Two options:
1.We find the case we are looking for, so the first two elements of our list are "ab", or
2. "a" and "b" are not first two elements of our list. call the same function without first element of the list,and increase indx_val so our result will be correct.We stop doing this when the len(list) = 2 and we didn't find a case. (assuming we're looking for length of 2 chars)
edit: for all lengths
l = ["a","b","c","d"]
def my_indexes(pattern, look_list, indx_val):
if pattern == "".join(look_list)[:len(pattern)]:
return indx_val, indx_val+len(pattern) # -1 to match correct indexes
else:
if len(look_list) == len(pattern):
return None
return my_indexes(pattern, look_list[1:],indx_val+1)
print(my_indexes("cd",l,0))

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?

Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)

This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.

Realizing if there is a pattern in a string (Does not need to start at index 0, could be any length)

Coding a program to detect a n-length pattern in a string, even without knowing where the pattern starts, could be easily done by creating a list of n-length substrings and check if starting at one point there are same items or the rest of the list. Without any piece of information other than the string to check through, is the only way to recognize the pattern is to brute-force through all lengths and check or is there a more efficient algorithm?
(I'm just a beginner in Python, so this may be easy to code... )
Current code that only suits checking for starting at index 0:
def search(s):
match=s[0]+s[1]
while (match != s) and (match[0] != match[-1]):
for matchLen in range(len(match),len(s)-1):
letter = s[matchLen]
if letter == match[-1]:
match += s[len(match)]
break
if match == s:
return None
else:
return match[:-1]

You can use re.findall(r'(.{2,})\1+', string). The parentheses creates a capture group that is later backreferenced by \1. The . matches any character (except for line breaks). The {2,} requires the pattern to be at least two characters long (otherwise strings like ss would be considered a pattern). Finally the + requires that pattern to repeat 1 or more times (in addition to the first time that it occurred inside the capture group). You can see it working in action.

Pattern is a far too vague term, but assuming you mean some string repeating itself, the regexp (?P<pat>.+)(?P=pat) will work.

Given a string what you could do is -
You start with length = 1, and take two pointer variables i and j which you shall use to traverse the string.
Set i = 0 and j = i+length
if str[i]==str[j]:
i++,j++ // till j not equal to length of string
else:
length = length + 1
//increase length by 1 and start the algorithm over from i = 0
Take the example abcdeabcde :
In this we see
Initially i = 0, j = 1 ,
but str[0]!=str[1] i.e. a!=b,
Then we get length = 2 i.e., i = 0,j = 2
but str[0]!=str[2] i.e. a!=c,
Continuing in the same fashion,
We see when length = 5 and i = 0 and j = 5,
str[0]==str[5]
and thus you can see that i and j increment till j is equal to string length.
And you have your answer that is the pattern length. It may not seem obvious but i would suggest you dry-run this algorithm over some of your test cases and let me know the results.

You can use re.findall() to find all matches:
import re
s = "somethingabcdeabcdeabcdeabcdeabcdeelseabcdeabcdeabcde"
li = re.findall(r'abcde',s)
print(li)
Output:
['abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde']

Anagrams code resulting in infinite results

I need to generate anagrams for an application. I am using the following code for generating anagrams
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
for j in anagrams(s[:i]+s[i+1:]):
tmp.append(j+letter)
print (j+letter)
return tmp
The code above works in general. However, it prints infinite results when the following string is passed
str = "zzzzzzziizzzz"
print anagrams(str)
Can someone tell me where I am going wrong? I need unique anagrams of a string

This is not an infinity of results, this is 13!(*) words (a bit over 6 billions); you are facing a combinatorial explosion.
(*) 13 factorial.

Others have pointed out that your code produces 13! anagrams, many of them duplicates. Your string of 11 z's and 2 i's has only 78 unique anagrams, however. (That's 13! / (11!·2!) or 13·12 / 2.)
If you want only these strings, make sure that you don't recurse down for the same letter more than once:
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
if not letter in s[:i]:
for j in anagrams(s[:i] + s[i+1:]):
tmp.append(letter + j )
return tmp
The additional test is probably not the most effective way to tell whether a letter has already been used, but in your case with many duplicate letters it will save a lot of recursions.

There isn't infinte results - just 13! or 6,227,020,800
You're just not waiting long enough for the 6 billion results.
Note that much of the output is duplicates. If you are meaning to not print out the duplicates, then the number of results is much smaller.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

The longest prefix that is also suffix of two lists - python

So I have two lists: def function(w,w2): # => this is how I want to define my function (no more inputs than this 2 lists) I want to know the biggest prefix of w which is also suffix of w2. How can I do this only with logic (without importing anything)

Related

find pattern in a string without using regex

Recursively searching for a string in a list of characters

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

Realizing if there is a pattern in a string (Does not need to start at index 0, could be any length)

Anagrams code resulting in infinite results

Categories

Resources