Finding all possible permutations of a given string in python - python

I have a string. I want to generate all permutations from that string, by changing the order of characters in it. For example, say:
x='stack'
what I want is a list like this,
l=['stack','satck','sackt'.......]
Currently I am iterating on the list cast of the string, picking 2 letters randomly and transposing them to form a new string, and adding it to set cast of l. Based on the length of the string, I am calculating the number of permutations possible and continuing iterations till set size reaches the limit.
There must be a better way to do this.

The itertools module has a useful method called permutations(). The documentation says:
itertools.permutations(iterable[, r])
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length of the
iterable and all possible full-length permutations are generated.
Permutations are emitted in lexicographic sort order. So, if the input
iterable is sorted, the permutation tuples will be produced in sorted
order.
You'll have to join your permuted letters as strings though.
>>> from itertools import permutations
>>> perms = [''.join(p) for p in permutations('stack')]
>>> perms
['stack', 'stakc', 'stcak', 'stcka', 'stkac', 'stkca', 'satck',
'satkc', 'sactk', 'sackt', 'saktc', 'sakct', 'sctak', 'sctka',
'scatk', 'scakt', 'sckta', 'sckat', 'sktac', 'sktca', 'skatc',
'skact', 'skcta', 'skcat', 'tsack', 'tsakc', 'tscak', 'tscka',
'tskac', 'tskca', 'tasck', 'taskc', 'tacsk', 'tacks', 'taksc',
'takcs', 'tcsak', 'tcska', 'tcask', 'tcaks', 'tcksa', 'tckas',
'tksac', 'tksca', 'tkasc', 'tkacs', 'tkcsa', 'tkcas', 'astck',
'astkc', 'asctk', 'asckt', 'asktc', 'askct', 'atsck', 'atskc',
'atcsk', 'atcks', 'atksc', 'atkcs', 'acstk', 'acskt', 'actsk',
'actks', 'ackst', 'ackts', 'akstc', 'aksct', 'aktsc', 'aktcs',
'akcst', 'akcts', 'cstak', 'cstka', 'csatk', 'csakt', 'cskta',
'cskat', 'ctsak', 'ctska', 'ctask', 'ctaks', 'ctksa', 'ctkas',
'castk', 'caskt', 'catsk', 'catks', 'cakst', 'cakts', 'cksta',
'cksat', 'cktsa', 'cktas', 'ckast', 'ckats', 'kstac', 'kstca',
'ksatc', 'ksact', 'kscta', 'kscat', 'ktsac', 'ktsca', 'ktasc',
'ktacs', 'ktcsa', 'ktcas', 'kastc', 'kasct', 'katsc', 'katcs',
'kacst', 'kacts', 'kcsta', 'kcsat', 'kctsa', 'kctas', 'kcast',
'kcats']
If you find yourself troubled by duplicates, try fitting your data into a structure with no duplicates like a set:
>>> perms = [''.join(p) for p in permutations('stacks')]
>>> len(perms)
720
>>> len(set(perms))
360
Thanks to #pst for pointing out that this is not what we'd traditionally think of as a type cast, but more of a call to the set() constructor.

You can get all N! permutations without much code
def permutations(string, step = 0):
# if we've gotten to the end, print the permutation
if step == len(string):
print "".join(string)
# everything to the right of step has not been swapped yet
for i in range(step, len(string)):
# copy the string (store as array)
string_copy = [character for character in string]
# swap the current index with the step
string_copy[step], string_copy[i] = string_copy[i], string_copy[step]
# recurse on the portion of the string that has not been swapped yet (now it's index will begin with step + 1)
permutations(string_copy, step + 1)

Here is another way of doing the permutation of string with minimal code based on bactracking.
We basically create a loop and then we keep swapping two characters at a time,
Inside the loop we'll have the recursion. Notice,we only print when indexers reaches the length of our string.
Example:
ABC
i for our starting point and our recursion param
j for our loop
here is a visual help how it works from left to right top to bottom (is the order of permutation)
the code :
def permute(data, i, length):
if i==length:
print(''.join(data) )
else:
for j in range(i,length):
#swap
data[i], data[j] = data[j], data[i]
permute(data, i+1, length)
data[i], data[j] = data[j], data[i]
string = "ABC"
n = len(string)
data = list(string)
permute(data, 0, n)

Stack Overflow users have already posted some strong solutions but I wanted to show yet another solution. This one I find to be more intuitive
The idea is that for a given string: we can recurse by the algorithm (pseudo-code):
permutations = char + permutations(string - char) for char in string
I hope it helps someone!
def permutations(string):
"""
Create all permutations of a string with non-repeating characters
"""
permutation_list = []
if len(string) == 1:
return [string]
else:
for char in string:
[permutation_list.append(char + a) for a in permutations(string.replace(char, "", 1))]
return permutation_list

Here's a simple function to return unique permutations:
def permutations(string):
if len(string) == 1:
return string
recursive_perms = []
for c in string:
for perm in permutations(string.replace(c,'',1)):
recursive_perms.append(c+perm)
return set(recursive_perms)

itertools.permutations is good, but it doesn't deal nicely with sequences that contain repeated elements. That's because internally it permutes the sequence indices and is oblivious to the sequence item values.
Sure, it's possible to filter the output of itertools.permutations through a set to eliminate the duplicates, but it still wastes time generating those duplicates, and if there are several repeated elements in the base sequence there will be lots of duplicates. Also, using a collection to hold the results wastes RAM, negating the benefit of using an iterator in the first place.
Fortunately, there are more efficient approaches. The code below uses the algorithm of the 14th century Indian mathematician Narayana Pandita, which can be found in the Wikipedia article on Permutation. This ancient algorithm is still one of the fastest known ways to generate permutations in order, and it is quite robust, in that it properly handles permutations that contain repeated elements.
def lexico_permute_string(s):
''' Generate all permutations in lexicographic order of string `s`
This algorithm, due to Narayana Pandita, is from
https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
To produce the next permutation in lexicographic order of sequence `a`
1. Find the largest index j such that a[j] < a[j + 1]. If no such index exists,
the permutation is the last permutation.
2. Find the largest index k greater than j such that a[j] < a[k].
3. Swap the value of a[j] with that of a[k].
4. Reverse the sequence from a[j + 1] up to and including the final element a[n].
'''
a = sorted(s)
n = len(a) - 1
while True:
yield ''.join(a)
#1. Find the largest index j such that a[j] < a[j + 1]
for j in range(n-1, -1, -1):
if a[j] < a[j + 1]:
break
else:
return
#2. Find the largest index k greater than j such that a[j] < a[k]
v = a[j]
for k in range(n, j, -1):
if v < a[k]:
break
#3. Swap the value of a[j] with that of a[k].
a[j], a[k] = a[k], a[j]
#4. Reverse the tail of the sequence
a[j+1:] = a[j+1:][::-1]
for s in lexico_permute_string('data'):
print(s)
output
aadt
aatd
adat
adta
atad
atda
daat
data
dtaa
taad
tada
tdaa
Of course, if you want to collect the yielded strings into a list you can do
list(lexico_permute_string('data'))
or in recent Python versions:
[*lexico_permute_string('data')]

Here is another approach different from what #Adriano and #illerucis posted. This has a better runtime, you can check that yourself by measuring the time:
def removeCharFromStr(str, index):
endIndex = index if index == len(str) else index + 1
return str[:index] + str[endIndex:]
# 'ab' -> a + 'b', b + 'a'
# 'abc' -> a + bc, b + ac, c + ab
# a + cb, b + ca, c + ba
def perm(str):
if len(str) <= 1:
return {str}
permSet = set()
for i, c in enumerate(str):
newStr = removeCharFromStr(str, i)
retSet = perm(newStr)
for elem in retSet:
permSet.add(c + elem)
return permSet
For an arbitrary string "dadffddxcf" it took 1.1336 sec for the permutation library, 9.125 sec for this implementation and 16.357 secs for #Adriano's and #illerucis' version. Of course you can still optimize it.

Here's a slightly improved version of illerucis's code for returning a list of all permutations of a string s with distinct characters (not necessarily in lexicographic sort order), without using itertools:
def get_perms(s, i=0):
"""
Returns a list of all (len(s) - i)! permutations t of s where t[:i] = s[:i].
"""
# To avoid memory allocations for intermediate strings, use a list of chars.
if isinstance(s, str):
s = list(s)
# Base Case: 0! = 1! = 1.
# Store the only permutation as an immutable string, not a mutable list.
if i >= len(s) - 1:
return ["".join(s)]
# Inductive Step: (len(s) - i)! = (len(s) - i) * (len(s) - i - 1)!
# Swap in each suffix character to be at the beginning of the suffix.
perms = get_perms(s, i + 1)
for j in range(i + 1, len(s)):
s[i], s[j] = s[j], s[i]
perms.extend(get_perms(s, i + 1))
s[i], s[j] = s[j], s[i]
return perms

See itertools.combinations or itertools.permutations.

why do you not simple do:
from itertools import permutations
perms = [''.join(p) for p in permutations(['s','t','a','c','k'])]
print perms
print len(perms)
print len(set(perms))
you get no duplicate as you can see :
['stack', 'stakc', 'stcak', 'stcka', 'stkac', 'stkca', 'satck', 'satkc',
'sactk', 'sackt', 'saktc', 'sakct', 'sctak', 'sctka', 'scatk', 'scakt', 'sckta',
'sckat', 'sktac', 'sktca', 'skatc', 'skact', 'skcta', 'skcat', 'tsack',
'tsakc', 'tscak', 'tscka', 'tskac', 'tskca', 'tasck', 'taskc', 'tacsk', 'tacks',
'taksc', 'takcs', 'tcsak', 'tcska', 'tcask', 'tcaks', 'tcksa', 'tckas', 'tksac',
'tksca', 'tkasc', 'tkacs', 'tkcsa', 'tkcas', 'astck', 'astkc', 'asctk', 'asckt',
'asktc', 'askct', 'atsck', 'atskc', 'atcsk', 'atcks', 'atksc', 'atkcs', 'acstk',
'acskt', 'actsk', 'actks', 'ackst', 'ackts', 'akstc', 'aksct', 'aktsc', 'aktcs',
'akcst', 'akcts', 'cstak', 'cstka', 'csatk', 'csakt', 'cskta', 'cskat', 'ctsak',
'ctska', 'ctask', 'ctaks', 'ctksa', 'ctkas', 'castk', 'caskt', 'catsk', 'catks',
'cakst', 'cakts', 'cksta', 'cksat', 'cktsa', 'cktas', 'ckast', 'ckats', 'kstac',
'kstca', 'ksatc', 'ksact', 'kscta', 'kscat', 'ktsac', 'ktsca', 'ktasc', 'ktacs',
'ktcsa', 'ktcas', 'kastc', 'kasct', 'katsc', 'katcs', 'kacst', 'kacts', 'kcsta',
'kcsat', 'kctsa', 'kctas', 'kcast', 'kcats']
120
120
[Finished in 0.3s]

def permute(seq):
if not seq:
yield seq
else:
for i in range(len(seq)):
rest = seq[:i]+seq[i+1:]
for x in permute(rest):
yield seq[i:i+1]+x
print(list(permute('stack')))

All Possible Word with stack
from itertools import permutations
for i in permutations('stack'):
print(''.join(i))
permutations(iterable, r=None)
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length of the iterable and all possible full-length permutations are generated.
Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order.
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.

This is a recursive solution with n! which accepts duplicate elements in the string
import math
def getFactors(root,num):
sol = []
# return condition
if len(num) == 1:
return [root+num]
# looping in next iteration
for i in range(len(num)):
# Creating a substring with all remaining char but the taken in this iteration
if i > 0:
rem = num[:i]+num[i+1:]
else:
rem = num[i+1:]
# Concatenating existing solutions with the solution of this iteration
sol = sol + getFactors(root + num[i], rem)
return sol
I validated the solution taking into account two elements, the number of combinations is n! and the result can not contain duplicates. So:
inpt = "1234"
results = getFactors("",inpt)
if len(results) == math.factorial(len(inpt)) | len(results) != len(set(results)):
print("Wrong approach")
else:
print("Correct Approach")

With recursive approach.
def permute(word):
if len(word) == 1:
return [word]
permutations = permute(word[1:])
character = word[0]
result = []
for p in permutations:
for i in range(len(p)+1):
result.append(p[:i] + character + p[i:])
return result
running code.
>>> permute('abc')
['abc', 'bac', 'bca', 'acb', 'cab', 'cba']

Yet another initiative and recursive solution. The idea is to select a letter as a pivot and then create a word.
def find_premutations(alphabet):
words = []
word =''
def premute(new_word, alphabet):
if not alphabet:
words.append(word)
else:
for i in range(len(alphabet)):
premute(new_word=word + alphabet[i], alphabet=alphabet[0:i] + alphabet[i+1:])
premute(word, alphabet)
return words
# let us try it with 'abc'
a = 'abc'
find_premutations(a)
Output:
abc
acb
bac
bca
cab
cba

Here's a really simple generator version:
def find_all_permutations(s, curr=[]):
if len(s) == 0:
yield curr
else:
for i, c in enumerate(s):
for combo in find_all_permutations(s[:i]+s[i+1:], curr + [c]):
yield "".join(combo)
I think it's not so bad!

def f(s):
if len(s) == 2:
X = [s, (s[1] + s[0])]
return X
else:
list1 = []
for i in range(0, len(s)):
Y = f(s[0:i] + s[i+1: len(s)])
for j in Y:
list1.append(s[i] + j)
return list1
s = raw_input()
z = f(s)
print z

Here's a simple and straightforward recursive implementation;
def stringPermutations(s):
if len(s) < 2:
yield s
return
for pos in range(0, len(s)):
char = s[pos]
permForRemaining = list(stringPermutations(s[0:pos] + s[pos+1:]))
for perm in permForRemaining:
yield char + perm

from itertools import permutations
perms = [''.join(p) for p in permutations('ABC')]
perms = [''.join(p) for p in permutations('stack')]

def perm(string):
res=[]
for j in range(0,len(string)):
if(len(string)>1):
for i in perm(string[1:]):
res.append(string[0]+i)
else:
return [string];
string=string[1:]+string[0];
return res;
l=set(perm("abcde"))
This is one way to generate permutations with recursion, you can understand the code easily by taking strings 'a','ab' & 'abc' as input.
You get all N! permutations with this, without duplicates.

Everyone loves the smell of their own code. Just sharing the one I find the simplest:
def get_permutations(word):
if len(word) == 1:
yield word
for i, letter in enumerate(word):
for perm in get_permutations(word[:i] + word[i+1:]):
yield letter + perm

This program does not eliminate the duplicates, but I think it is one of the most efficient approaches:
s=raw_input("Enter a string: ")
print "Permutations :\n",s
size=len(s)
lis=list(range(0,size))
while(True):
k=-1
while(k>-size and lis[k-1]>lis[k]):
k-=1
if k>-size:
p=sorted(lis[k-1:])
e=p[p.index(lis[k-1])+1]
lis.insert(k-1,'A')
lis.remove(e)
lis[lis.index('A')]=e
lis[k:]=sorted(lis[k:])
list2=[]
for k in lis:
list2.append(s[k])
print "".join(list2)
else:
break

With Recursion
# swap ith and jth character of string
def swap(s, i, j):
q = list(s)
q[i], q[j] = q[j], q[i]
return ''.join(q)
# recursive function
def _permute(p, s, permutes):
if p >= len(s) - 1:
permutes.append(s)
return
for i in range(p, len(s)):
_permute(p + 1, swap(s, p, i), permutes)
# helper function
def permute(s):
permutes = []
_permute(0, s, permutes)
return permutes
# TEST IT
s = "1234"
all_permute = permute(s)
print(all_permute)
With Iterative approach (Using Stack)
# swap ith and jth character of string
def swap(s, i, j):
q = list(s)
q[i], q[j] = q[j], q[i]
return ''.join(q)
# iterative function
def permute_using_stack(s):
stk = [(0, s)]
permutes = []
while len(stk) > 0:
p, s = stk.pop(0)
if p >= len(s) - 1:
permutes.append(s)
continue
for i in range(p, len(s)):
stk.append((p + 1, swap(s, p, i)))
return permutes
# TEST IT
s = "1234"
all_permute = permute_using_stack(s)
print(all_permute)
With Lexicographically sorted
# swap ith and jth character of string
def swap(s, i, j):
q = list(s)
q[i], q[j] = q[j], q[i]
return ''.join(q)
# finds next lexicographic string if exist otherwise returns -1
def next_lexicographical(s):
for i in range(len(s) - 2, -1, -1):
if s[i] < s[i + 1]:
m = s[i + 1]
swap_pos = i + 1
for j in range(i + 1, len(s)):
if m > s[j] > s[i]:
m = s[j]
swap_pos = j
if swap_pos != -1:
s = swap(s, i, swap_pos)
s = s[:i + 1] + ''.join(sorted(s[i + 1:]))
return s
return -1
# helper function
def permute_lexicographically(s):
s = ''.join(sorted(s))
permutes = []
while True:
permutes.append(s)
s = next_lexicographical(s)
if s == -1:
break
return permutes
# TEST IT
s = "1234"
all_permute = permute_lexicographically(s)
print(all_permute)

This code makes sense to me. The logic is to loop through all characters, extract the ith character, perform the permutation on the other elements and append the ith character at the beginning.
If i'm asked to get all permutations manually for string ABC. I would start by checking all combinations of element A:
A AB
A BC
Then all combinations of element B:
B AC
B CA
Then all combinations of element C:
C AB
C BA
def permute(s: str):
n = len(s)
if n == 1: return [s]
if n == 2:
return [s[0]+s[1], s[1]+s[0]]
permutations = []
for i in range(0, n):
current = s[i]
others = s[:i] + s[i+1:]
otherPermutations = permute(others)
for op in otherPermutations:
permutations.append(current + op)
return permutations

Simpler solution using permutations.
from itertools import permutations
def stringPermutate(s1):
length=len(s1)
if length < 2:
return s1
perm = [''.join(p) for p in permutations(s1)]
return set(perm)

def permute_all_chars(list, begin, end):
if (begin == end):
print(list)
return
for current_position in range(begin, end + 1):
list[begin], list[current_position] = list[current_position], list[begin]
permute_all_chars(list, begin + 1, end)
list[begin], list[current_position] = list[current_position], list[begin]
given_str = 'ABC'
list = []
for char in given_str:
list.append(char)
permute_all_chars(list, 0, len(list) -1)

The itertools module in the standard library has a function for this which is simply called permutations.
import itertools
def minion_game(s):
vow ="aeiou"
lsword=[]
ta=[]
for a in range(1,len(s)+1):
t=list(itertools.permutations(s,a))
lsword.append(t)
for i in range(0,len(lsword)):
for xa in lsword[i]:
if vow.startswith(xa):
ta.append("".join(xa))
print(ta)
minion_game("banana")

Related

Leetcode 5: Longes Palindrome Substring

I have been working on the LeetCode problem 5. Longest Palindromic Substring:
Given a string s, return the longest palindromic substring in s.
But I kept getting time limit exceeded on large test cases.
I used dynamic programming as follows:
dp[(i, j)] = True implies that s[i] to s[j] is a palindrome. So if s[i] == str[j] and dp[(i+1, j-1]) is set to True, that means S[i] to S[j] is also a palindrome.
How can I improve the performance of this implementation?
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
# single character is always a palindrome
dp[(i, i)] = True
res = s[i]
#fill in the table diagonally
for x in range(len(s) - 1):
i = 0
j = x + 1
while j <= len(s)-1:
if s[i] == s[j] and (j - i == 1 or dp[(i+1, j-1)] == True):
dp[(i, j)] = True
if(j-i+1) > len(res):
res = s[i:j+1]
else:
dp[(i, j)] = False
i += 1
j += 1
return res
I think the judging system for this problem is kind of too tight, it took some time to make it pass, improved version:
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
dp[(i, i)] = True
res = s[i]
for x in range(len(s)): # iterate till the end of the string
for i in range(x): # iterate up to the current state (less work) and for loop looks better here
if s[i] == s[x] and (dp.get((i + 1, x - 1), False) or x - i == 1):
dp[(i, x)] = True
if x - i + 1 > len(res):
res = s[i:x + 1]
return res
Here is another idea to improve the performance:
The nested loop will check over many cases where the DP value is already False for smaller ranges. We can avoid looking at large spans, by looking for palindromes from inside-out and stop extending the span as soon as it no longer is a palindrome. This process should be repeated at every offset in the source string, but this could still save some processing.
The inputs for which then most time is wasted, are those where there are lots of the same letters after each other, like "aaaaaaabcaaaaaaa". These lead to many iterations: each "a" or "aa" could be the center of a palindrome, but "growing" each of them is a waste of time. We should just consider all consecutive "a" together from the start and expand from there onwards.
You can specifically deal with these cases by first grouping consecutive letters which are the same. So the above example would be turned into 4 groups: a(7)b(1)c(1)a(7)
Then let each group in turn be taken as the center of a palindrome. For each group, "fan out" to potentially include one or more neighboring groups at both sides in "tandem". Continue fanning out until either the outside groups are not about the same letter, or they have a different group size. From that result you can derive what the largest palindrome is around that center. In particular, when the case is that the letters of the outer groups are the same, but not their sizes, you still include that letter at the outside of the palindrome, but with a repetition that corresponds to the least of these two mismatching group sizes.
Here is an implementation. I used named tuples to make it more readable:
from itertools import groupby
from collections import namedtuple
Group = namedtuple("Group", "letter,size,end")
class Solution:
def longestPalindrome(self, s: str) -> str:
longest = ""
x = 0
groups = [Group(group[0], len(group), x := x + len(group)) for group in
("".join(group[1]) for group in groupby(s))]
for i in range(len(groups)):
for j in range(0, min(i+1, len(groups) - i)):
if groups[i - j].letter != groups[i + j].letter:
break
left = groups[i - j]
right = groups[i + j]
if left.size != right.size:
break
size = right.end - (left.end - left.size) - abs(left.size - right.size)
if size > len(longest):
x = left.end - left.size + max(0, left.size - right.size)
longest = s[x:x+size]
return longest
Alternatively, you can try this approach, it seems to be faster than 96% Python submission.
def longestPalindrome(self, s: str) -> str:
N = len(s)
if N == 0:
return 0
max_len, start = 1, 0
for i in range(N):
df = i - max_len
if df >= 1 and s[df-1: i+1] == s[df-1: i+1][::-1]:
start = df - 1
max_len += 2
continue
if df >= 0 and s[df: i+1] == s[df: i+1][::-1]:
start= df
max_len += 1
return s[start: start + max_len]
If you want to improve the performance, you should create a variable for len(s) at the beginning of the function and use it. That way instead of calling len(s) 3 times, you would do it just once.
Also, I see no reason to create a class for this function. A simple function will outrun a class method, albeit very slightly.

How can I loop through a python list stopping at every element

I am trying to solve this Leetcode problem, without using Sets
https://leetcode.com/problems/longest-substring-without-repeating-characters/
The way im tackling it is by stopping at every element in the list and going one element right until i find a duplicate, store that as the longest subset without duplication and continuing with the next element to do it again.
I'm stuck because i cant find a way to iterate the whole list at every element
Below is what i have so far
def lengthOfLongestSubstring(self, s):
mylist = list(s)
mylist2 = list(s)
final_res = []
res = []
for i in mylist2:
for char in mylist:
if char in res:
res=[]
break
else:
res.append(char)
if len(res) > len(final_res):
final_res = res
return final_res
If your goal is primarily to avoid using set (or dict), you can use a solution based on buckets for the full range of possible characters ("English letters, digits, symbols and spaces", which have numeric codes in the range 0-127):
class Solution(object):
def lengthOfLongestSubstring(self, s):
'''
at a given index i with char c,
if the value for c in the list of buckets b
has a value (index of most recent sighting) > start,
update start to be that value + 1
otherwise update res to be max(res, i - start + 1)
update the value for c in b to be i
'''
res, start, b = 0, 0, [-1]*128
for i, c in enumerate(s):
k = ord(c)
if b[k] >= start:
start = b[k] + 1
else:
res = max(res, i - start + 1)
b[k] = i
return res
The call to ord() allows integer indexing of the bucketed list b.

Time Complexity and Space Complexity of the Python Code

Can someone please help me with the time and space complexity of this code snippet? Please refer to the leetcode question- Word break ii.Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences.
def wordBreak(self, s, wordDict):
dp = {}
def word_break(s):
if s in dp:
return dp[s]
result = []
for w in wordDict:
if s[:len(w)] == w:
if len(w) == len(s):
result.append(w)
else:
tmp = word_break(s[len(w):])
for t in tmp:
result.append(w + " " + t)
dp[s] = result
return result
return word_break(s)
Decision Tree
Time Complexity
height of tree is len(target)=m in worst case
branching is len(words)=n
in each recursive call, you make slicing if s[:len(w)]. worst case is m
T: O(N^m * M)=O(n^m)
Space complexity
space from call stack is the height of tree=m
IN each recursive call you are making suffix and storing it. its length is m
S : O(m*m)
For time complexity analysis, almost always we'd take the worst case scenario into account.
I guess O(N ^ 3) runtime and O(N) memory would be right.
Here is a breakdown:
def wordBreak(self, s, wordDict):
dp = {}
def word_break(s):
# This block would be O(N)
if s in dp: # order of N
return dp[s]
result = []
# This block would be O(N ^ 2)
for w in wordDict: # order of N
if s[:len(w)] == w:
if len(w) == len(s):
result.append(w)
else:
tmp = word_break(s[len(w):])
for t in tmp: # order of N
result.append(w + " " + t)
dp[s] = result
return result
return word_break(s) # This'd be O(N) itself
Therefore, O(N ^ 2) of the second block is multiplied by O(N) of calling the word_break(s) which would eventually make it O(N ^ 3).
We can a bit simplify it to:
class Solution:
def wordBreak(self, s, words):
dp = [False] * len(s)
for i in range(len(s)): # order of N
for word in words: # order of N
k = i - len(word)
if word == s[k + 1:i + 1] and (dp[k] or k == -1): # order of N
dp[i] = True
return dp[-1]
which would make it easier to see:
for i in range(len(s)): is an order of N;
for word in words: is an order of N;
if word == s[k + 1:i + 1] and (dp[k] or k == -1): is an order of N;

Find/extract a sequence of integers within a list in python

I want to find a sequence of n consecutive integers within a sorted list and return that sequence. This is the best I can figure out (for n = 4), and it doesn't allow the user to specify an n.
my_list = [2,3,4,5,7,9]
for i in range(len(my_list)):
if my_list[i+1] == my_list[i]+1 and my_list[i+2] == my_list[i]+2 and my_list[i+3] == my_list[i]+3:
my_sequence = list(range(my_list[i],my_list[i]+4))
my_sequence = [2,3,4,5]
I just realized this code doesn't work and returns an "index out of range" error, so I'll have to mess with the range of the for loop.
Here's a straight-forward solution. It's not as efficient as it might be, but it will be fine unless you have very long lists:
myarray = [2,5,1,7,3,8,1,2,3,4,5,7,4,9,1,2,3,5]
for idx, a in enumerate(myarray):
if myarray[idx:idx+4] == [a,a+1,a+2,a+3]:
print([a, a+1,a+2,a+3])
break
Create a nested master result list, then go through my_sorted_list and add each item to either the last list in the master (if discontinuous) or to a new list in the master (if continuous):
>>> my_sorted_list = [0,2,5,7,8,9]
>>> my_sequences = []
>>> for idx,item in enumerate(my_sorted_list):
... if not idx or item-1 != my_sequences[-1][-1]:
... my_sequences.append([item])
... else:
... my_sequences[-1].append(item)
...
>>> max(my_sequences, key=len)
[7, 8, 9]
A short and concise way is to fill an array with numbers every time you find the next integer is the current integer plus 1 (until you already have N consecutive numbers in array), and for anything else, we can empty the array:
arr = [4,3,1,2,3,4,5,7,5,3,2,4]
N = 4
newarr = []
for i in range(len(arr)-1):
if(arr[i]+1 == arr[i+1]):
newarr += [arr[i]]
if(len(newarr) == N):
break
else:
newarr = []
When the code is run, newarr will be:
[1, 2, 3, 4]
#size = length of sequence
#span = the span of neighbour integers
#the time complexity is O(n)
def extractSeq(lst,size,span=1):
lst_size = len(lst)
if lst_size < size:
return []
for i in range(lst_size - size + 1):
for j in range(size - 1):
if lst[i + j] + span == lst[i + j + 1]:
continue
else:
i += j
break
else:
return lst[i:i+size]
return []
mylist = [2,3,4,5,7,9]
for j in range(len(mylist)):
m=mylist[j]
idx=j
c=j
for i in range(j,len(mylist)):
if mylist[i]<m:
m=mylist[i]
idx=c
c+=1
tmp=mylist[j]
mylist[j]=m
mylist[idx]=tmp
print(mylist)

Filter list of strings, ignoring substrings of other items

How can I filter through a list which containing strings and substrings to return only the longest strings. (If any item in the list is a substring of another, return only the longer string.)
I have this function. Is there a faster way?
def filterSublist(lst):
uniq = lst
for elem in lst:
uniq = [x for x in uniq if (x == elem) or (x not in elem)]
return uniq
lst = ["a", "abc", "b", "d", "xy", "xyz"]
print filterSublist(lst)
> ['abc', 'd', 'xyz']
> Function time: 0.000011
A simple quadratic time solution would be this:
res = []
n = len(lst)
for i in xrange(n):
if not any(i != j and lst[i] in lst[j] for j in xrange(n)):
res.append(lst[i])
But we can do much better:
Let $ be a character that does not appear in any of your strings and has a lower value than all your actual characters.
Let S be the concatenation of all your strings, with $ in between. In your example, S = a$abc$b$d$xy$xyz.
You can build the suffix array of S in linear time. You can also use a much simpler O(n log^2 n) construction algorithm that I described in another answer.
Now for every string in lst, check if it occurs in the suffix array exactly once. You can do two binary searches to find the locations of the substring, they form a contiguous range in the suffix array. If the string occurs more than once, you remove it.
With LCP information precomputed, this can be done in linear time as well.
Example O(n log^2 n) implementation, adapted from my suffix array answer:
def findFirst(lo, hi, pred):
""" Find the first i in range(lo, hi) with pred(i) == True.
Requires pred to be a monotone. If there is no such i, return hi. """
while lo < hi:
mid = (lo + hi) // 2
if pred(mid): hi = mid;
else: lo = mid + 1
return lo
# uses the algorithm described in https://stackoverflow.com/a/21342145/916657
class SuffixArray(object):
def __init__(self, s):
""" build the suffix array of s in O(n log^2 n) where n = len(s). """
n = len(s)
log2 = 0
while (1<<log2) < n:
log2 += 1
rank = [[0]*n for _ in xrange(log2)]
for i in xrange(n):
rank[0][i] = s[i]
L = [0]*n
for step in xrange(1, log2):
length = 1 << step
for i in xrange(n):
L[i] = (rank[step - 1][i],
rank[step - 1][i + length // 2] if i + length // 2 < n else -1,
i)
L.sort()
for i in xrange(n):
rank[step][L[i][2]] = \
rank[step][L[i - 1][2]] if i > 0 and L[i][:2] == L[i-1][:2] else i
self.log2 = log2
self.rank = rank
self.sa = [l[2] for l in L]
self.s = s
self.rev = [0]*n
for i, j in enumerate(self.sa):
self.rev[j] = i
def lcp(self, x, y):
""" compute the longest common prefix of s[x:] and s[y:] in O(log n). """
n = len(self.s)
if x == y:
return n - x
ret = 0
for k in xrange(self.log2 - 1, -1, -1):
if x >= n or y >= n:
break
if self.rank[k][x] == self.rank[k][y]:
x += 1<<k
y += 1<<k
ret += 1<<k
return ret
def compareSubstrings(self, x, lx, y, ly):
""" compare substrings s[x:x+lx] and s[y:y+yl] in O(log n). """
l = min((self.lcp(x, y), lx, ly))
if l == lx == ly: return 0
if l == lx: return -1
if l == ly: return 1
return cmp(self.s[x + l], self.s[y + l])
def count(self, x, l):
""" count occurences of substring s[x:x+l] in O(log n). """
n = len(self.s)
cs = self.compareSubstrings
lo = findFirst(0, n, lambda i: cs(self.sa[i], min(l, n - self.sa[i]), x, l) >= 0)
hi = findFirst(0, n, lambda i: cs(self.sa[i], min(l, n - self.sa[i]), x, l) > 0)
return hi - lo
def debug(self):
""" print the suffix array for debugging purposes. """
for i, j in enumerate(self.sa):
print str(i).ljust(4), self.s[j:], self.lcp(self.sa[i], self.sa[i-1]) if i >0 else "n/a"
def filterSublist(lst):
splitter = "\x00"
s = splitter.join(lst) + splitter
sa = SuffixArray(s)
res = []
offset = 0
for x in lst:
if sa.count(offset, len(x)) == 1:
res.append(x)
offset += len(x) + 1
return res
However, the interpretation overhead likely causes this to be slower than the O(n^2) approaches unless S is really large (in the order of 10^5 characters or more).
You can build your problem in matrix form as:
import numpy as np
lst = np.array(["a", "abc", "b", "d", "xy", "xyz"], object)
out = np.zeros((len(lst), len(lst)), dtype=int)
for i in range(len(lst)):
for j in range(len(lst)):
out[i,j] = lst[i] in lst[j]
from where you get out as:
array([[1, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1]])
then, the answer will be the indices of lst where the sum of òut along the columns is 1 (the string is only in itself):
lst[out.sum(axis=1)==1]
#array(['abc', 'd', 'xyz'], dtype=object)
EDIT:
You can do it much more efficiently with:
from numpy.lib.stride_tricks import as_strided
from string import find
size = len(lst)
a = np.char.array(lst)
a2 = np.char.array(as_strided(a, shape=(size, size),
strides=(a.strides[0], 0)))
out = find(a2, a)
out[out==0] = 1
out[out==-1] = 0
print a[out.sum(axis=0)==1]
# chararray(['abc', 'd', 'xyz'], dtype='|S3')
a[out.sum(axis=0)==1]
Does the order matter? If not,
a = ["a", "abc", "b", "d", "xy", "xyz"]
a.sort(key=len, reverse=True)
n = len(a)
for i in range(n - 1):
if a[i]:
for j in range(i + 1, n):
if a[j] in a[i]:
a[j] = ''
print filter(len, a) # ['abc', 'xyz', 'd']
Not very efficient, but simple.
O(n) solution:
import collections
def filterSublist1(words):
longest = collections.defaultdict(str)
for word in words: # O(n)
for i in range(1, len(word)+1): # O(k)
for j in range(len(word) - i + 1): # O(k)
subword = word[j:j+i] # O(1)
if len(longest[subword]) < len(word): # O(1)
longest[subword] = word # O(1)
return list(set(longest.values())) # O(n)
# Total: O(nk²)
Explanation:
To understand the time complexity, I've given an upper bound for the complexity at each line in the above code. The bottleneck occurs at the for loops where because they are nested the overall time complexity will be O(nk²), where n is the number of words in the list and k is the length of the average/longest word (e.g. in the above code n = 6 and k = 3). However, assuming that the words are not arbitrarily long strings, we can bound k by some small value -- e.g. k=5 if we consider the average word length in the english dictionary. Therefore, because k is bounded by a value, it is not included in the time complexity, and we get the running time to be O(n). Of course, the size of k will add to the constant factor, especially if k is not smaller than n. For the English dictionary, this would mean that when n >> k² = 25 you would start to see better results than with other algorithms (see the graph below).
The algorithm works by mapping each unique subword to the longest string that contains that subword. So for example, 'xyz' would lookup longest['x'], longest['y'], longest['z'], longest['xy'], longest['yz'], and longest['xyz'] and set them all equal to 'xyz'. When this is done for every word in the list, longest.keys() will be a set of all unique subwords of all words, and longest.values() will be a list of only the words that are not subwords of other words. Finally, longest.values() can contain duplicates, so they are removed by wrapping in set.
Visualizing Time Complexity
Below I've tested the algorithm above (along with your original algorithm) to show that this solution is indeed O(n) when using English words. I've tested this using timeit on a list of up to 69000 English words. I've labeled this algorithm filterSublist1 and your original algorithm filterSublist2.
The plot is shown on a log-log axis, meaning that the time complexity of the algorithm for this input set is given by the slope of the lines. For filterSublist1 the slope is ~1 meaning it is O(n1) and for filterSublist2 the slope is ~2 meaning it is O(n2).
NOTE: I'm missing the filterSublist2() time for 69000 words because I didnt feel like waiting.

Categories