permutation calculation runtime complexity with some changes

permutation calculation runtime complexity with some changes - python

I have a question about the runtime complexity of the standard permutation finding algorithm. Consider a list A, find (and print) all permutations of its elements.
Here's my recursive implementation, where printperm() prints every permutation:
def printperm(A, p):
if len(A) == len(p):
print("".join(p))
return
for i in range(0, len(A)):
if A[i] != 0:
tmp = A[i] # remember ith position
A[i] = 0 # mark character i as used
p.append(tmp) # select character i for this permutation
printperm(A, p) # Solve subproblem, which is smaller because we marked a character in this subproblem as smaller
p.pop() # done with selecting character i for this permutation
A[i] = tmp # restore character i in preparation for selecting the next available character
printperm(['a', 'b', 'c', 'd'], [])
The runtime complexity appears to be O(n!) where n is the size of A. This is because at each recursion level, the amount of work decreases by 1. So, the top recursion level is n amount of work, the next level is n-1, and the next level is n-2, and so on. So the total complexity is n*(n-1)*(n-2)...=n!
Now the problem is the print("".join(p)) statement. Every time this line runs, it iterates through the list, which iterates through the entire list, which is complexity n. There are n! number of permutations of a list of size n. So that means the amount of work done by the print("".join(p)) statement is n!*n.
Does the presence of the print("".join(p)) statement then increases the runtime complexity to O(n * n!)?? But this doesn't seem right, because I'm not running the print statement on every recursion call. Where does my logic for getting O(n * n!) break down?

You're basically right! The possible confusion comes in your "... and the next level is n-2, and so on". "And so on" is glossing over that at the very bottom level of the recursion, you're not doing O(1) work, but rather O(n) work to do the print. So the total complexity is proportional to
n * (n-1) * (n-2) ... * 2 * n
which equals n! * n. Note that the .join() doesn't really matter to this. It would also take O(n) work to simply print(p).
EDIT: But that's not really right, for a different reason. At all levels above the print level, you're doing
for i in range(0, len(A)):
and len(A) doesn't change. So every level is doing O(n) work. To be sure, the deeper the level the more zeroes there are in A, and so the less work the loop does, but it's nevertheless still O(n) merely to iterate over range(n) at all.

Permutations can be generated from combinations using divide and conquer technique to achieve better time complexity. However, generated output is not in order.
Suppose we want to generate permutations for n=8 {0,1,2,3,4,5,6,7}.
Below are instances of permutations
01234567
01234576
03215654
30126754
Note first 4 items in each arrangement are of same set {0,1,2,3} and last 4 items are of same set {4,5,6,7}
This means given set A={0,1,2,3} B={4,5,6,7} permutations from A are 4! and from B are 4!.
It follows that taking one instance of permutations from A and one instance of permutations from B we get a correct permutation instance from universal set {0,1,2,3,4,5,6,7}.
Total permutations given sets A and B such that A union B and A equals universal set and B intersection is null set gives universal set are 2*A!*B! (By swapping A and B order
In this case we get 2*4!*4! = 1156.
So to generate all permutations when n=8, we simply need to generate possible combinations of sets A and B that satisfy conditions explained earlier.
Generating combinations of 8 elements taking 4 at time in lexicographical order is pretty fast. We need to generate the first half combinations each assign to respective set A and set B is found using set operation.
For each pair A and B we apply divide and conquer algorithm.
A and B doesn't need to be of equal sizes and the trick can be applied recursive for higher values of n.
In stead of using 8! = 40320 steps
We can use 8!/(2*4!4!)(4!+4!+1) = 1750 steps with this method. I ignore the cross product operation when joining set A and set B permutations because its straight forward operation.
In real implementation this method runs approximately 20 times faster than some naive algorithm and can take advantage of parallelism. For example different pairs of set A and B can be grouped and process on different threads.

Related

Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio
. I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.
import random
lst = [random.randrange(10) for _ in range(200)]
In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.
For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:
partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)
To test the solution, you can do something like:
from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps)
for part in partitions for i, x in enumerate(part)])
Any suggestions?

You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).
M = 100
N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)
for _ in range(M):
# get a new random permutation
random.shuffle(lst)
partition = []
# starting index (in the permutation) of the current sublist
lo = 0
# permutation partial sum
s = 0
# index of sublist we are currently generating (i.e. what ratio we are on)
j = 0
# ratio partial sum
rs = ratios[j]
for i in range(N):
s += lst[i]
# if ratio of permutation partial sum exceeds ratio of ratio partial sum,
# the current sublist is "complete"
if s / S >= rs / R:
partition.append(lst[lo:i + 1])
# start creating new sublist from next element
lo = i + 1
j += 1
if j == P:
# done with partition
# remaining elements will always all be zeroes
# (i.e. assert should never fail)
assert all(x == 0 for x in lst[i+1:])
partition[-1].extend(lst[i+1:])
break
rs += ratios[j]
Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).
You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.

How do I calculate Time Complexity for this particular algorithm?

I know there are many other questions out there asking for the general guide of how to calculate the time complexity, such as this one.
From them I have learnt that when there is a loop, such as the (for... if...) in my Python programme, the Time complexity is N * N where N is the size of input. (please correct me if this is also wrong) (Edited once after being corrected by an answer)
# greatest common divisor of two integers
a, b = map(int, input().split())
list = []
for i in range(1, a+b+1):
if a % i == 0 and b % i == 0:
list.append(i)
n = len(list)
print(list[n-1])
However, do other parts of the code also contribute to the time complexity, that will make it more than a simple O(n) = N^2 ? For example, in the second loop where I was finding the common divisors of both a and b (a%i = 0), is there a way to know how many machine instructions the computer will execute in finding all the divisors, and the consequent time complexity, in this specific loop?
I wish the question is making sense, apologise if it is not clear enough.
Thanks for answering

First, a few hints:
In your code there is no nested loop. The if-statement does not constitute a loop.
Not all nested loops have quadratic time complexity.
Writing O(n) = N*N doesn't make any sense: what is n and what is N? Why does n appear on the left but N is on the right? You should expect your time complexity function to be dependent on the input of your algorithm, so first define what the relevant inputs are and what names you give them.
Also, O(n) is a set of functions (namely those asymptotically bounded from above by the function f(n) = n, whereas f(N) = N*N is one function. By abuse of notation, we conventionally write n*n = O(n) to mean n*n ∈ O(n) (which is a mathematically false statement), but switching the sides (O(n) = n*n) is undefined. A mathematically correct statement would be n = O(n*n).
You can assume all (fixed bit-length) arithmetic operations to be O(1), since there is a constant upper bound to the number of processor instructions needed. The exact number of processor instructions is irrelevant for the analysis.
Let's look at the code in more detail and annotate it:
a, b = map(int, input().split()) # O(1)
list = [] # O(1)
for i in range(1, a+b+1): # O(a+b) multiplied by what's inside the loop
if a % i == 0 and b % i == 0: # O(1)
list.append(i) # O(1) (amortized)
n = len(list) # O(1)
print(list[n-1]) # O(log(a+b))
So what's the overall complexity? The dominating part is indeed the loop (the stuff before and after is negligible, complexity-wise), so it's O(a+b), if you take a and b to be the input parameters. (If you instead wanted to take the length N of your input input() as the input parameter, it would be O(2^N), since a+b grows exponentially with respect to N.)

One thing to keep in mind, and you have the right idea, is that higher degree take precedence. So you can have a step that’s constant O(1) but happens n times O(N) then it will be O(1) * O(N) = O(N).
Your program is O(N) because the only thing really affecting the time complexity is the loop, and as you know a simple loop like that is O(N) because it increases linearly as n increases.
Now if you had a nested loop that had both loops increasing as n increased, then it would be O(n^2).

Count all possible triangles

I have 3 lists each containing some integers.
Now I have to find out how many Triangles can I make such that each side of the triangle is from a different list.
A = [3,1,2]
B = [1,5]
C = [2,4,1]
Possible Triangles:
3,5,4
1,1,1
2,1,2
2,5,4
So answer should be 4.
I tried using three loops and using the triangle property where sum of any two sides is always greater than the third. But the time complexity for this O(n^3). I want something more faster.
count = 0
for i in range(len(a)):
for j in range(len(b)):
for k in range(len(c)):
if (a[i]+b[j] > c[k]) and (a[i]+c[k] > b[j]) and (c[k]+b[j] > a[i]):
count += 1
print(count)

What's the O of this code?
import itertools as it
[sides for sides in it.product(A, B, C) if max(sides) < sum(sides) / 2.]

Edit: Jump to the bottom for an O(n^2) solution I found later, but which builds on the ideas in my previous solutions.
Solution 1: use binary search to get O(n^2 log(n))
Pseudo code:
C_s = sort C in ascending order
for a in A:
for b in B:
use binary search to find the highest index c_low in C_s s.t. C_s[c_low] <= |a-b|
use binary search to find lowest index c_high in C_s s.t. C_s[c_high] >= a+b
count += (c_high-c_low-1)
Explanation and demonstration of correctness:
given the pair (a,b):
values of c from C_s[0] to C_s[c_low] are too short to make a triangle
values of c from C_s[c_high] to C_s[end] are too large
there are c_high-c_low-1 values of c between c_high and c_low, excluding the end points themselves, that will make a valid triangle with sides (a,b,c)
I assume the standard binary search behaviour of returning an index just past either end if all values in C_s are either > |a-b| or < a+b, as if C_s[-1] = -infinity and C_s[end+1] = +infinity.
Solution 2: cache values to get best case O(n^2) but worse case O(n^2 log n)
The algorithm above can be improved with a cache for the results of the binary searches. If you cache them in an associative array with insert and lookup both O(1), then this is faster in the best case, and probably also in the average case, depending on the distribution of inputs:
C_s = sort C in ascending order
low_cache = empty associative array
high_cache = empty associative array
for a in A:
for b in B:
if |a-b| in low_cache:
c_low = low_cache{|a-b|}
else:
use binary search to find the highest index c_low in C_s s.t. C_s[c_low] <= |a-b|
set low_cache{|a-b|} = c_low
if a+b in high_cache:
c_high = high_cache{a+b}
else:
use binary search to find lowest index c_high in C_s s.t. C_s[c_high] >= a+b
set high_cache{a+b} = c_high
count += (c_high-c_low-1)
Best case analysis: if A and B are both {1 .. n}, i.e. the n numbers between 1 and n, then the binary search for c_low will be called n times, and the binary search for c_high will be called 2*n-1 times, and the rest of the time the results will be found in the cache.
So the inner loop does 2*n^2 cache lookups with total time O(2*n^2 * 1) = O(n^2), plus 3*n-1 binary searches with total time O((3*n-1) log n) = O(n log n), with an overall best-case time of O(n^2).
Worst case analysis: the cache misses every time, we're back to the original algorithm with its O(n^2 log n).
Average case: this all depends on the distribution of inputs.
More ideas I thought of
You might be able to make it faster by sorting A and B too. EDIT: see solution 3 below for how to get rid of the log n factor by sorting B. I haven't yet found a way to improve it further by sorting A, though.
For Example, if you sort B, as you move to the next b, you can assume the next c_low is greater than the previous if |a-b| is greater, or vice versa. Similar logic applies to c_high. EDIT: the merge idea in solution 3 takes advantage of exactly this fact.
An easy optimization if A, B and C are of different sizes: make C the largest one of the three lists.
Another idea that doesn't actually help, but that I think is worth describing to guide further thought:
calculate all the values of |a-b| and a+b and store them in an array (time: O(n^2))
sort that array (O((n^2)log(n^2))=O(n^2 log n)) -- this is the step that breaks the idea and brings us back to O(n^2 log n),
use a merge algorithm between than sorted array and C_sorted (O(n^2+n)=O(n^2)) to extract the info from the results.
That is very fuzzy pseudo code, I know, but if sorting the array of n^2 elements could somehow be made faster, maybe by constructing that array in a more clever way by sorting A and B first, maybe the whole thing could come down to O(n^2), at the cost of significantly increased complexity. However, thinking about this overnight, I don't see how to get rid of the log n factor here: you could make O(n) sorted lists of O(n) elements and merge them, but merging O(n) lists brings the log n factor right back in - it may be somewhat faster, but only by a constant, the complexity class stays the same.
Solution 3: use merge to bring it down to O(n^2)
Trying to figure out I could how to take advantage of sorting B, I finally realized the idea was simple: use the merging algorithm of merge sort, which can merge two sorted lists of n elements in 2*n - 1 comparisons, but modify it to calculate c_low and c_high (as defined in solution 1) for all elements of B in O(n) operations.
For c_high, this is simple, because {a+b for b in B} sorts in the same order as B, but for c_low it will take two merging passes, since {|a-b| for b in B} is descending as b increases up to b<=a, and ascending again for b>=a.
Pseudo code:
B_s = sort B in ascending order
C_s = sort C in ascending order
C_s_rev = reverse C_s
for a in A:
use binary search to find the highest index mid where B_s[mid] <= a
use an adapted merge over B_s[0..mid] and C_s_rev to calculate c_low_i for i in 0 .. mid
use an adapted merge over B_s[mid+1..end_B] and C_s to calculate c_low_i for i in mid+1 .. end_B
use an adapted merge over B_s and C_s to calculate c_high_i for i in range(len(B_s))
for i in 0 .. end_B:
count += c_high_i - c_low_i - 1
Big-Oh analysis: the body of the loop takes O(log(n) + 2n + 2n + 2n + n) = O(n) and runs n times (over the elements of A) yielding a total complexity class of O(n^2).

Space-Time complexity in variation of Dutch National Flag

A variation of the DNF is as follows:
def dutch_flag_partition(pivot_index , A):
pivot = A[pivot_index]
# First pass: group elements smaller than pivot.
for i in range(len(A)):
# Look for a smaller element.
for j in range(i + 1, len(A)):
if A[j] < pivot:
A[i], A[j] = A[j], A[i]
break
# Second pass: group elements larger than pivot.
for i in reversed(range(len(A))):
if A[i] < pivot:
break
# Look for a larger element. Stop when we reach an element less than
# pivot , since first pass has moved them to the start of A.
for j in reversed(range(i)):
if A[j] > pivot:
A[i], A[j] = A[j], A[i]
break
The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length? And time complexity, given as O(N^2), is it so due to the nested loops? Thanks

The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length?
No. Swapping, in fact, takes no extra space at all.
More importantly, you can't just look for one thing and say however much that thing takes, that's the complexity. You have to look over all the things, and the largest one determines the complexity. So, look over all the things you're creating:
pivot is just a reference to one of the list members, which is constant size.
a range is constant size.
an iterator over a range is constant-size.
the i and j integer values returned by the range iterator are constant size.1
…
Since nothing is larger than constant size, the total size is constant.
And time complexity, given as O(N^2), is it so due to the nested loops?
Well, yes, but you have to get a bit more detailed than that. Two nested loops don't necessarily mean quadratic. Two nested loops that do linear work inside the nested loop would be cubic. Two nested loops that combine so that the size of the inner loop depends inversely on the outer loop are linear. And so on.
And again, you have to add up everything, not just pick one thing and guess.
So, the first pass does:
A plain list indexing and assignment, constant.
A loop over the input length.
… with a loop over the input length
… with some list indexing, comparisons, and assignments, all constant
… which also breaks early in some cases… which we can come back to.
So, if the break doesn't help at all, that's O(1 + N * N * 1), which is O(N * N).
And the second pass is similarly O(N * (1 + N * 1)), which is again O(N * N).
And if you add O(N * N + N * N), you get O(N * N).
Also, even if the break made the first pass log-linear or something, O(N * log N + N * N) is still O(N * N), so it wouldn't matter.
So the time is quadratic.
1. Technically, this isn't quite true. Integers are variable-sized, and the memory they take is the log of their magnitude. So, i and j, and the stop attributes of the range objects, and probably some other stuff are all log N. But, unless you're dealing with huge-int arithmetic, like in crypto algorithms that multiply huge prime factors, people usually ignore this, and get away with it.

The additional space complexity is given as O(1). Is that because the swapping doesn't depend on the input length?
As you are "just" swapping there is no new data being created or generated, you are just reassigning values you already have, thus why the space complexity is constant.
And time complexity, given as O(N^2), is it so due to the nested loops?
True. It's a second order polynomial time complexity because you have two for loops nested.
You have a break in them, so in more favorable cases your time complexity will be below N^2. However, as big-O is worst case then it's ok to say it's of degree 2.

Time complexity of a function

I'm trying to find out the time complexity (Big-O) of functions and trying to provide appropriate reason.
First function goes:
r = 0
# Assignment is constant time. Executed once. O(1)
for i in range(n):
for j in range(i+1,n):
for k in range(i,j):
r += 1
# Assignment and access are O(1). Executed n^3
like this.
I see that this is triple nested loop, so it must be O(n^3).
but I think my reasoning here is very weak. I don't really get what is going
on inside the triple nested loop here
Second function is:
i = n
# Assignment is constant time. Executed once. O(1)
while i>0:
k = 2 + 2
i = i // 2
# i is reduced by the equation above per iteration.
# so the assignment and access which are O(1) is executed
# log n times ??
I found out this algorithm to be O(1). But like the first function,
I don't see what is going on in the while-loop.
Can someone explain thoroughly about the time complexity of the two
functions? Thanks!

For such a simple case, you could find the number of iterations of the innermost loop as a function of n exactly:
sum_(i=0)^(n-1)(sum_(j=i+1)^(n-1)(sum_(k=i)^(j-1) 1)) = 1/6 n (n^2-1)
i.e., Θ(n**3) time complexity (see Big Theta): it assumes that r += 1 is O(1) if r has O(log n) digits (model has words with log n bits).
The second loop is even simpler: i //= 2 is i >>= 1. n has Θ(log n) digits and each iteration drops one binary digit (shift right) and therefore the whole loop is Θ(log n) time complexity if we assume that the i >> 1 shift of log(n) digits is O(1) operation (same model as in the first example).

Well first of all, for the first function, the time complexity seems to be closer to O(N log N) because the parameters of each loop decreases each time.
Also, for the second function, the runtime is O(log2 N). Except, say i == n == 2. After one run i is 1. After another i is 0.5. After another i is 0.25. And so on... I assume you would want int(i).
For a rigorous mathematical approach to each function, you can go to https://www.coursera.org/course/algo. It's a great course for this sort of thing. I was sort of sloppy in my calculations.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.