Permutation algorithm analysis - python

I've been studying algorithms like crazy for a big interview. This particular algorithm is driving me crazy I've added comments to some lines that don't understand.
def permute(s):
out = []
if len(s) == 1:
# wouldn't setting out replace everything in out?
out = [s]
else:
for i, let in enumerate(s):
# how does it know that I only want 2 strings?
for perm in permute(s[:i] + s[i+1:]):
out += [let + perm]
return out
print permute("cat")
Is it correct to say that the time complexity of this algorithm is O(n!)?

Initially out is defined inside the context of the permute method, so each call will have its own out vector. So when redefining out = [s] you just overriding the out=[] inside the method context.
If the input is bigger than one char this is what happens:
# Iterate for each char
for i, let in enumerate(s):
# Iterate for each permutation of the string without the char i
for perm in permute(s[:i] + s[i+1:]):
# Put the removed char in the beginning of the permutation
# and add it to the list.
out += [let + perm]

Just for fun, here's a generator version of that algorithm. It's a bit nicer because it doesn't require those out lists.
def permute(s):
if len(s) == 1:
yield s
else:
for i, let in enumerate(s):
for perm in permute(s[:i] + s[i+1:]):
yield let + perm
for s in permute("abc"):
print(s)
output
abc
acb
bac
bca
cab
cba
Of course, it's almost always better to avoid recursion (especially in Python) unless the problem needs recursion (eg processing recursive data structure, like trees). And of course a real Python program would normally use itertools.permutations, unless it needs to correctly handle repeating items in the base sequence. In that case, I recommend the iterative algorithm of Narayana Pandita, as shown in this answer.

Related

for j in anagram(word[:i] + word[i+1:]): <- how it works?

I built anagram generator. It works, but I don't know for loop for functions works at line 8, why does it works only in
for j in anagram(word[:i] + word[i+1:]):
why not
for j in anagram(word):
Also, I want to know what
for j in anagram(...)
means and doing...
what is j doing in this for loop?
this is my full code
def anagram(word):
n = len(word)
anagrams = []
if n <= 1:
return word
else:
for i in range(n):
for j in anagram(word[:i] + word[i+1:]):
anagrams.append(word[i:i+1] + j)
return anagrams
if __name__ == "__main__":
print(anagram("abc"))
The reason you can't write for i in anagram(word) is that it creates an infinite loop.
So for example if I write the recursive factorial function,
def fact(n):
if n <= 1:
return 1
return n * fact(n - 1)
This works and is not a circular definition because I am giving the computer two separate equations to compute the factorial:
n! = 1
n! = n (n-1)!
and I am telling it when to use each of these: the first one when n is 0 or 1, the second when n is larger than that. The key to its working is that eventually we stop using the second definition, and we instead use the first definition, which is called the “base case.” If I were to instead say another true definition like that n! = n! the computer would follow those instructions but we would never reduce down to the base case and so we would enter an infinite recursive loop. This loop would probably exhaust a resource called the “stack” rapidly, leading to errors about “excessive recursion” or too many “stack frames” or just “stack overflow” (for which this site is named!). And then if you gave it a mathematically invalid expression like n! = n n! it would infinitely loop and also it would be wrong even if it did not infinitely loop.
Factorials and anagrams are closely related, in fact we can say mathematically that
len(anagrams(f)) == fact(len(f))
so solving one means solving the other. In this case we are saying that the anagram of a word which is empty or of length 1 is just [word], the list containing just that word. (Your algorithm messes this case up a little bit, so it's a bug.)
The anagram of any other word must have something to do with anagrams of words of length len(word) - 1. So what we do is we pull each character out of the word and put it at the front of the anagram. So word[:i] + word[i+1:] is the word except it is missing the letter at index i, and word[i:i+1] is the space between these -- in other words it is the letter at index i.
This is NOT an answer but a guide for you to understand the logic by yourself.
Firstly you should understand one thing anagram(word[:i] + word[i+1:]) is not same as anagram(word)
>>> a = 'abcd'
>>> a[:2] + a[(2+1):]
'abd'
You can clearly see the difference.
And for a clearer understanding I would recommend you to print the result of every word in the recursion. put a print(word) statement before the loop starts.

Python recursive permutation program explanation

I have a code that uses recursion to calculate the permutation of the characters of a string. I understand normal tail recursion and recursions for palindrome, factorial, decimal to binary conversion easily but i am having problem understanding how this recursion works, i mean how it actually works in the background, not just the abstract stuff from the higher level i get that.
here is the code
from __future__ import print_function
def permutef(s):
#print('\nIM CALLED\n')
out = []
if len(s) == 1:
out = [s]
else:
for i,let in enumerate(s):
#print('LETTER IS {} index is {}'.format(let, i))
#Slicing as not including that letter but includes every letter except that to perform the permutation
for perm in permutef( s[:i] + s[i+1:] ):
print(perm)
out += [let + perm]
return out
per = permutef('abc')
print('\n\n\n', per, '\n\n\n')
I was writing in a paper each circle is for each letter and how the corresponding stack pops
Don't ask about my handwriting i know its awesome (sarcasm)
here is the output screenshot
i want to understand the nitty gritty about how this works in the background, but i can't seem to fathom the concept, very very thanks in advance.
1 def permutef(s):
2 out = []
3 if len(s) == 1:
4 out = [s]
5 else:
6 for i,let in enumerate(s):
7 for perm in permutef( s[:i] + s[i+1:] ):
8 print(perm)
9 out += [let + perm]
10 return out
The principle is fairly straightforward. A one-character string (line 3) only has one permutation, represented by a list containing that character (line 4). The permutations of longer strings are generated by taking each character in the string and permuting the remaining characters - a fairly classic recursive divide-and-conquer approach.
For problems like this the Python Tutor site can be useful to visualise the execution of your code. The link I've provided is pre-loaded with the code above, and you can step forwards and backwards through the code until you understand how it works.

Anagrams code resulting in infinite results

I need to generate anagrams for an application. I am using the following code for generating anagrams
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
for j in anagrams(s[:i]+s[i+1:]):
tmp.append(j+letter)
print (j+letter)
return tmp
The code above works in general. However, it prints infinite results when the following string is passed
str = "zzzzzzziizzzz"
print anagrams(str)
Can someone tell me where I am going wrong? I need unique anagrams of a string
This is not an infinity of results, this is 13!(*) words (a bit over 6 billions); you are facing a combinatorial explosion.
(*) 13 factorial.
Others have pointed out that your code produces 13! anagrams, many of them duplicates. Your string of 11 z's and 2 i's has only 78 unique anagrams, however. (That's 13! / (11!·2!) or 13·12 / 2.)
If you want only these strings, make sure that you don't recurse down for the same letter more than once:
def anagrams(s):
if len(s) < 2:
return s
else:
tmp = []
for i, letter in enumerate(s):
if not letter in s[:i]:
for j in anagrams(s[:i] + s[i+1:]):
tmp.append(letter + j )
return tmp
The additional test is probably not the most effective way to tell whether a letter has already been used, but in your case with many duplicate letters it will save a lot of recursions.
There isn't infinte results - just 13! or 6,227,020,800
You're just not waiting long enough for the 6 billion results.
Note that much of the output is duplicates. If you are meaning to not print out the duplicates, then the number of results is much smaller.

Recursive list building: permutations of all lengths

I've been having some trouble with recursion, especially with lists. I don't understand the process of 'building a list recursively' very well, since I don't really get where the lists are created. I have gotten this program to print out all the permutations of len(string), but I would like to extend it so that it'll also give me the permutations of length 1 to len(string) - 1. Here is what I have got:
def subset(string):
result = []
if len(string) == 0:
result.append(string)
return result
else:
for i in range(len(string)):
shorterstring = string[ :i] + string[i+1: ]
shortersets = subset(shorterstring)
for s in shortersets:
result.append(string[i] + s)
return result
Which gives:
print(subset("rum"))
['rum', 'rmu', 'urm', 'umr', 'mru', 'mur']
I don't understand why that when I change result.append(string[i] + s), to just result.append(s), I get no output at all.
If you change result.append(string[i] + s) to result.append(s), your code will only add permutations of length len(string) to results when len(string) == 0.
For your code to generate all permutations, the last for loop needs to be:
for s in shortersets:
result.append(string[i] + s)
result.append(s)
Note that when used with the original code, you will actually end up adding multiple instances of the same permutations to the final output. You could fix this by making results a set instead of a list, but you might want to try re-writing your code to avoid this inefficiency altogether.

Word segmentation using dynamic programming

So first off I'm very new to Python so if I'm doing something awful I'm prefacing this post with a sorry. I've been assigned this problem:
We want to devise a dynamic programming solution to the following problem: there is a string of characters which might have been a sequence of words with all the spaces removed, and we want to find a way, if any, in which to insert spaces that separate valid English words. For example, theyouthevent could be from “the you the vent”, “the youth event” or “they out he vent”. If the input is theeaglehaslande, then there’s no such way. Your task is to implement a dynamic programming solution in two separate ways:
iterative bottom-up version
recursive memorized version
Assume that the original sequence of words had no other punctuation (such as periods), no capital letters, and no proper names - all the words will be available in a dictionary file that will be provided to you.
So I'm having two main issues:
I know that this can and should be done in O(N^2) and I don't think mine is
The lookup table isn't adding all the words it seems such that it can reduce the time complexity
What I'd like:
Any kind of input (better way to do it, something you see wrong in the code, how I can get the lookup table working, how to use the table of booleans to build a sequence of valid words)
Some idea on how to tackle the recursive version although I feel once I am able to solve the iterative solution I will be able to engineer the recursive one from it.
As always thanks for any time and or effort anyone gives this, it is always appreciated.
Here's my attempt:
#dictionary function returns True if word is found in dictionary false otherwise
def dictW(s):
diction = open("diction10k.txt",'r')
for x in diction:
x = x.strip("\n \r")
if s == x:
return True
return False
def iterativeSplit(s):
n = len(s)
i = j = k = 0
A = [-1] * n
word = [""] * n
booly = False
for i in range(0, n):
for j in range(0, i+1):
prefix = s[j:i+1]
for k in range(0, n):
if word[k] == prefix:
#booly = True
A[k] = 1
#print "Array below at index k %d and word = %s"%(k,word[k])
#print A
# print prefix, A[i]
if(((A[i] == -1) or (A[i] == 0))):
if (dictW(prefix)):
A[i] = 1
word[i] = prefix
#print word[i], i
else:
A[i] = 0
for i in range(0, n):
print A[i]
For another real-world example of how to do English word segmentation, look at the source of the Python wordsegment module. It's a little more sophisticated because it uses word and phrase frequency tables but it illustrates the memoization approach.
In particular, segment illustrates the memoization approach:
def segment(text):
"Return a list of words that is the best segmenation of `text`."
memo = dict()
def search(text, prev='<s>'):
if text == '':
return 0.0, []
def candidates():
for prefix, suffix in divide(text):
prefix_score = log10(score(prefix, prev))
pair = (suffix, prefix)
if pair not in memo:
memo[pair] = search(suffix, prefix)
suffix_score, suffix_words = memo[pair]
yield (prefix_score + suffix_score, [prefix] + suffix_words)
return max(candidates())
result_score, result_words = search(clean(text))
return result_words
If you replaced the score function so that it returned "1" for a word in your dictionary and "0" if not then you would simply enumerate all positively scored candidates for your answer.
Here is the solution in C++. Read and understand the concept, and then implement.
This video is very helpful for understanding DP approach.
One more approach which I feel can help is Trie data structure. It is a better way to solve the above problem.

Categories