Python recursive permutation program explanation - python

I have a code that uses recursion to calculate the permutation of the characters of a string. I understand normal tail recursion and recursions for palindrome, factorial, decimal to binary conversion easily but i am having problem understanding how this recursion works, i mean how it actually works in the background, not just the abstract stuff from the higher level i get that.
here is the code
from __future__ import print_function
def permutef(s):
#print('\nIM CALLED\n')
out = []
if len(s) == 1:
out = [s]
else:
for i,let in enumerate(s):
#print('LETTER IS {} index is {}'.format(let, i))
#Slicing as not including that letter but includes every letter except that to perform the permutation
for perm in permutef( s[:i] + s[i+1:] ):
print(perm)
out += [let + perm]
return out
per = permutef('abc')
print('\n\n\n', per, '\n\n\n')
I was writing in a paper each circle is for each letter and how the corresponding stack pops
Don't ask about my handwriting i know its awesome (sarcasm)
here is the output screenshot
i want to understand the nitty gritty about how this works in the background, but i can't seem to fathom the concept, very very thanks in advance.

1 def permutef(s):
2 out = []
3 if len(s) == 1:
4 out = [s]
5 else:
6 for i,let in enumerate(s):
7 for perm in permutef( s[:i] + s[i+1:] ):
8 print(perm)
9 out += [let + perm]
10 return out
The principle is fairly straightforward. A one-character string (line 3) only has one permutation, represented by a list containing that character (line 4). The permutations of longer strings are generated by taking each character in the string and permuting the remaining characters - a fairly classic recursive divide-and-conquer approach.
For problems like this the Python Tutor site can be useful to visualise the execution of your code. The link I've provided is pre-loaded with the code above, and you can step forwards and backwards through the code until you understand how it works.

Related

for j in anagram(word[:i] + word[i+1:]): <- how it works?

I built anagram generator. It works, but I don't know for loop for functions works at line 8, why does it works only in
for j in anagram(word[:i] + word[i+1:]):
why not
for j in anagram(word):
Also, I want to know what
for j in anagram(...)
means and doing...
what is j doing in this for loop?
this is my full code
def anagram(word):
n = len(word)
anagrams = []
if n <= 1:
return word
else:
for i in range(n):
for j in anagram(word[:i] + word[i+1:]):
anagrams.append(word[i:i+1] + j)
return anagrams
if __name__ == "__main__":
print(anagram("abc"))
The reason you can't write for i in anagram(word) is that it creates an infinite loop.
So for example if I write the recursive factorial function,
def fact(n):
if n <= 1:
return 1
return n * fact(n - 1)
This works and is not a circular definition because I am giving the computer two separate equations to compute the factorial:
n! = 1
n! = n (n-1)!
and I am telling it when to use each of these: the first one when n is 0 or 1, the second when n is larger than that. The key to its working is that eventually we stop using the second definition, and we instead use the first definition, which is called the “base case.” If I were to instead say another true definition like that n! = n! the computer would follow those instructions but we would never reduce down to the base case and so we would enter an infinite recursive loop. This loop would probably exhaust a resource called the “stack” rapidly, leading to errors about “excessive recursion” or too many “stack frames” or just “stack overflow” (for which this site is named!). And then if you gave it a mathematically invalid expression like n! = n n! it would infinitely loop and also it would be wrong even if it did not infinitely loop.
Factorials and anagrams are closely related, in fact we can say mathematically that
len(anagrams(f)) == fact(len(f))
so solving one means solving the other. In this case we are saying that the anagram of a word which is empty or of length 1 is just [word], the list containing just that word. (Your algorithm messes this case up a little bit, so it's a bug.)
The anagram of any other word must have something to do with anagrams of words of length len(word) - 1. So what we do is we pull each character out of the word and put it at the front of the anagram. So word[:i] + word[i+1:] is the word except it is missing the letter at index i, and word[i:i+1] is the space between these -- in other words it is the letter at index i.
This is NOT an answer but a guide for you to understand the logic by yourself.
Firstly you should understand one thing anagram(word[:i] + word[i+1:]) is not same as anagram(word)
>>> a = 'abcd'
>>> a[:2] + a[(2+1):]
'abd'
You can clearly see the difference.
And for a clearer understanding I would recommend you to print the result of every word in the recursion. put a print(word) statement before the loop starts.

Permutation algorithm analysis

I've been studying algorithms like crazy for a big interview. This particular algorithm is driving me crazy I've added comments to some lines that don't understand.
def permute(s):
out = []
if len(s) == 1:
# wouldn't setting out replace everything in out?
out = [s]
else:
for i, let in enumerate(s):
# how does it know that I only want 2 strings?
for perm in permute(s[:i] + s[i+1:]):
out += [let + perm]
return out
print permute("cat")
Is it correct to say that the time complexity of this algorithm is O(n!)?
Initially out is defined inside the context of the permute method, so each call will have its own out vector. So when redefining out = [s] you just overriding the out=[] inside the method context.
If the input is bigger than one char this is what happens:
# Iterate for each char
for i, let in enumerate(s):
# Iterate for each permutation of the string without the char i
for perm in permute(s[:i] + s[i+1:]):
# Put the removed char in the beginning of the permutation
# and add it to the list.
out += [let + perm]
Just for fun, here's a generator version of that algorithm. It's a bit nicer because it doesn't require those out lists.
def permute(s):
if len(s) == 1:
yield s
else:
for i, let in enumerate(s):
for perm in permute(s[:i] + s[i+1:]):
yield let + perm
for s in permute("abc"):
print(s)
output
abc
acb
bac
bca
cab
cba
Of course, it's almost always better to avoid recursion (especially in Python) unless the problem needs recursion (eg processing recursive data structure, like trees). And of course a real Python program would normally use itertools.permutations, unless it needs to correctly handle repeating items in the base sequence. In that case, I recommend the iterative algorithm of Narayana Pandita, as shown in this answer.

Finding longest alphabetical substring - understanding the concepts in Python

I am completing the Introduction to Computer Science and Programming Using Python Course and am stuck on Week 1: Python Basics - Problem Set 1 - Problem 3.
The problem asks:
Assume s is a string of lower case characters.
Write a program that prints the longest substring of s in which the
letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then your program should print
Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print*
Longest substring in alphabetical order is: abc
There are many posts on stack overflow where people are just chasing or giving the code as the answer. I am looking to understand the concept behind the code as I am new to programming and want gain a better understanding of the basics
I found the following code that seems to answer the question. I understand the basic concept of the for loop, I am having trouble understanding how to use them (for loops) to find alphabetical sequences in a string
Can someone please help me understand the concept of using the for loops in this way.
s = 'cyqfjhcclkbxpbojgkar'
lstring = s[0]
slen = 1
for i in range(len(s)):
for j in range(i,len(s)-1):
if s[j+1] >= s[j]:
if (j+1)-i+1 > slen:
lstring = s[i:(j+1)+1]
slen = (j+1)-i+1
else:
break
print("Longest substring in alphabetical order is: " + lstring)
Let's go through your code step by step.
First we assume that the first character forms the longest sequence. What we will do is try improving this guess.
s = 'cyqfjhcclkbxpbojgkar'
lstring = s[0]
slen = 1
The first loop then picks some index i, it will be the start of a sequence. From there, we will check all existing sequences starting from i by looping over the possible end of a sequence with the nested loop.
for i in range(len(s)): # This loops over the whole string indices
for j in range(i,len(s)-1): # This loops over indices following i
This nested loops will allow us to check every subsequence by picking every combination of i and j.
The first if statement intends to check if that sequence is still an increasing one. If it is not we break the inner loop as we are not interested in that sequence.
if s[j+1] >= s[j]:
...
else:
break
We finally need to check if the current sequence we are looking at is better than our current guess by comparing its length to slen, which is our best guess.
if (j+1)-i+1 > slen:
lstring = s[i:(j+1)+1]
slen = (j+1)-i+1
Improvements
Note that this code is not optimal as it needlessly traverses your string multiple times. You could implement a more efficient approach that traverses the string only once to recover all increasing substrings and then uses max to pick the longuest one.
s = 'cyqfjhcclkbxpbojgkar'
substrings = []
start = 0
end = 1
while end < len(s):
if s[end - 1] > s[end]:
substrings.append(s[start:end])
start = end + 1
end = start + 1
else:
end += 1
lstring = max(substrings, key=len)
print("Longest substring in alphabetical order is: " + lstring)
The list substrings looks like this after the while-loop: ['cy', 'fj', 'ccl', 'bx', 'bo', 'gk']
From these, max(..., key=len) picks the longuest one.

Anagrams code resulting in infinite results

I need to generate anagrams for an application. I am using the following code for generating anagrams
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
for j in anagrams(s[:i]+s[i+1:]):
tmp.append(j+letter)
print (j+letter)
return tmp
The code above works in general. However, it prints infinite results when the following string is passed
str = "zzzzzzziizzzz"
print anagrams(str)
Can someone tell me where I am going wrong? I need unique anagrams of a string
This is not an infinity of results, this is 13!(*) words (a bit over 6 billions); you are facing a combinatorial explosion.
(*) 13 factorial.
Others have pointed out that your code produces 13! anagrams, many of them duplicates. Your string of 11 z's and 2 i's has only 78 unique anagrams, however. (That's 13! / (11!·2!) or 13·12 / 2.)
If you want only these strings, make sure that you don't recurse down for the same letter more than once:
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
if not letter in s[:i]:
for j in anagrams(s[:i] + s[i+1:]):
tmp.append(letter + j )
return tmp
The additional test is probably not the most effective way to tell whether a letter has already been used, but in your case with many duplicate letters it will save a lot of recursions.
There isn't infinte results - just 13! or 6,227,020,800
You're just not waiting long enough for the 6 billion results.
Note that much of the output is duplicates. If you are meaning to not print out the duplicates, then the number of results is much smaller.

Word segmentation using dynamic programming

So first off I'm very new to Python so if I'm doing something awful I'm prefacing this post with a sorry. I've been assigned this problem:
We want to devise a dynamic programming solution to the following problem: there is a string of characters which might have been a sequence of words with all the spaces removed, and we want to find a way, if any, in which to insert spaces that separate valid English words. For example, theyouthevent could be from “the you the vent”, “the youth event” or “they out he vent”. If the input is theeaglehaslande, then there’s no such way. Your task is to implement a dynamic programming solution in two separate ways:
iterative bottom-up version
recursive memorized version
Assume that the original sequence of words had no other punctuation (such as periods), no capital letters, and no proper names - all the words will be available in a dictionary file that will be provided to you.
So I'm having two main issues:
I know that this can and should be done in O(N^2) and I don't think mine is
The lookup table isn't adding all the words it seems such that it can reduce the time complexity
What I'd like:
Any kind of input (better way to do it, something you see wrong in the code, how I can get the lookup table working, how to use the table of booleans to build a sequence of valid words)
Some idea on how to tackle the recursive version although I feel once I am able to solve the iterative solution I will be able to engineer the recursive one from it.
As always thanks for any time and or effort anyone gives this, it is always appreciated.
Here's my attempt:
#dictionary function returns True if word is found in dictionary false otherwise
def dictW(s):
diction = open("diction10k.txt",'r')
for x in diction:
x = x.strip("\n \r")
if s == x:
return True
return False
def iterativeSplit(s):
n = len(s)
i = j = k = 0
A = [-1] * n
word = [""] * n
booly = False
for i in range(0, n):
for j in range(0, i+1):
prefix = s[j:i+1]
for k in range(0, n):
if word[k] == prefix:
#booly = True
A[k] = 1
#print "Array below at index k %d and word = %s"%(k,word[k])
#print A
# print prefix, A[i]
if(((A[i] == -1) or (A[i] == 0))):
if (dictW(prefix)):
A[i] = 1
word[i] = prefix
#print word[i], i
else:
A[i] = 0
for i in range(0, n):
print A[i]
For another real-world example of how to do English word segmentation, look at the source of the Python wordsegment module. It's a little more sophisticated because it uses word and phrase frequency tables but it illustrates the memoization approach.
In particular, segment illustrates the memoization approach:
def segment(text):
"Return a list of words that is the best segmenation of `text`."
memo = dict()
def search(text, prev='<s>'):
if text == '':
return 0.0, []
def candidates():
for prefix, suffix in divide(text):
prefix_score = log10(score(prefix, prev))
pair = (suffix, prefix)
if pair not in memo:
memo[pair] = search(suffix, prefix)
suffix_score, suffix_words = memo[pair]
yield (prefix_score + suffix_score, [prefix] + suffix_words)
return max(candidates())
result_score, result_words = search(clean(text))
return result_words
If you replaced the score function so that it returned "1" for a word in your dictionary and "0" if not then you would simply enumerate all positively scored candidates for your answer.
Here is the solution in C++. Read and understand the concept, and then implement.
This video is very helpful for understanding DP approach.
One more approach which I feel can help is Trie data structure. It is a better way to solve the above problem.

Categories