What is the time complexity of a while loop that uses random.shuffle (python) inside of it? - python

first of all, can we even measure it since we don't know how many times random.shuffle will shuffle the array until it reaches the desired outcome
def sort(numbers):
import random
while not sort(numbers)==numbers:
random.shuffle(numbers)
return numbers

First I assume the function name to not be sort as this would be trivial and would lead to unconditional infinite recursion. I am assuming this function
import random
def random_sort(numbers):
while not sorted(numbers) == numbers:
random.shuffle(numbers)
return numbers
Without looking at the implementation to much I would assume O(n) for the inner shuffle random.shuffle(numbers). Where n is the number of elements in numbers.
Then we have the while loop. It stops when the array is sorted. Now shuffle returns us one of all possible permutations of numbers. The loop aborts when its sorted. This is for just one of those. (if we don't assume a small number space).
This stopping is statistical. So we need technically define which complexity we are speaking of. This is where best case, worst case, amortized case comes in.
Best case
The numbers we get are already sorted. Then we have the cost of sort(numbers) and the comparison .. == numbers. Sorting a sorted array is O(n). So our best case complexity is O(n).
Worst case
The shuffle never gives us the right permutation. This is definitely possible. The algorithm would never terminate. So its O(โˆž).
Average case
This is probably the most interesting case. First we need to establish how many permutations shuffle is giving us. Here is a link which discusses that. An approximation is given as e โ‹… n!. Which is O(n!) (please check).
Now the question is on average when does our loop stop. This is answered in this link. They say its the geometric distribution (please check). The result is 1/ p, where p is the probablity of getting it. In our case this is p = 1 / (e โ‹… n!). So we need on average e โ‹… n! tries.
Now for each try we need to sort O(n log(n)), compare O(n) and compute the shuffle O(n). For the shuffle we can say it uses the Fisher Yates algorithm which has a complexity of O(n), as shown here.
So we have O(n! n log(n)) for the average complexity.

Related

What is the time complexity of a bubble sort algorithm applied n times to the same array?

I had this question on a test and i'm trying to understand it:
What is the time complexity of this function (in the worst case) assuming that Bubblesort() is the most optimized version of the Bubble Sort algorithm?
def MultSort(a,n):
for i in range(n):
BubbleSort(a)
The options were:
Linear
Quadratic
Cubic
I was thinking that the first sort (because it's the worst case) would be O(len(a)^2) and then n*O(len(a)) once it's ordered.But i can't really get to a result by myself
The annoying thing in this exercise is that the answers to choose from do not indicate what the dimension of variation is by which the complexity would be linear, quadratic or cubic. There are two obvious candidates: len(a) and n. Since no distinction is made in the exercise, and no information is given as to whether one of the two is to be assumed constant, your best guess would be to assume they are equal (n = len(a)).
We could guess that n is given as a separate argument because the same function signature would be used in languages where there is no equivalent for the len() function, like for C arrays.
Anyway, with the assumption that n = len(a), here is what we can say:
the first sort (because it's the worst case) would be O(len(a)^2)
Yes, or in other words: O(๐‘›ยฒ)
and then n*O(len(a)) once it's ordered.
Yes, or in other words: ๐‘›O(๐‘›) = O(๐‘›ยฒ)
But i can't really get to a result...
The final step is to add these two phases that you have analysed, so here is a spoiler in case you don't see how those can be added:
O(๐‘›ยฒ) + O(๐‘›ยฒ) = O(๐‘›ยฒ)
...which means the anser is:
Quadratic

How do you take randomness into account when finding the complexity of a function?

I've been trying to understand complexities, but all the online material has left me confused. Especially the part where they create actual mathematical functions. I have a for loop and a while loop. My confusion arises from the while loop. I know the complexity of the for loop is O(n), but the while loop is based on randomness. A random number if picked, and if this number is not in the list, it is added and the while loop broken. But my confusion arises here, the while loop may run in the worst case (in my thoughts) for an m number of times, until it is done. So I was thinking the complexity would then be O(n*m)?
I'm just really lost, and need some help.
Technically worst-case complexity is O(inf): random.randint if we consider it real random generator (it isn't, of course) can produce arbitrary long sequence with equal elements. However, we can estimate "average-case" complexity. It isn't the real average-case complexity (best, worst and average cases must be defined by input, not randomly), but it can show how many iterations program will do, if we run it for fixed n multiple times and take average of results.
Let's note that the list works as set here (you never add repeated number), so I'd stick with not in set comparison instead which is O(1) (while not in list is O(i)) to remove that complexity source and simplify things a bit: now count of iterations and complexity can be estimated with same big O limits. Single trial here is choosing from uniform integer distribution on [1; n]. Success is choosing number that is not in the list yet.
Then what's expected value of number of trials before getting item that is not in the set? Set size before each step is i in your code. We can pick any of n-i numbers. Thus probability of success is p_i = (n-i)/n (as the distribution is uniform). Every outer iteration is an example of geometrical distribution: count of trials before first success. So estimated count of while iterations is n_i = 1 / p_i = n / (n-i). To get final complexity we should sum this counts for each for iteration: sum(n_i for i in range(n)). This is obviously equal to n * Harmonic(n), where Harmonic(n) is n-th harmonic number (sum of first n reciprocals to natural numbers). Harmonic(n) ~ O(log n), thus "average-case" complexity of this code is O(n log n).
For list it will be sum(i*n / (n-i) for i in range(n)) ~ O(n^2 log(n)) (proof of this equality will be a little longer).
Big 'O' notation is used for worst case scenarios only.
figure out what could be he worst case for given loop.
make a function in 'n' , and take highest power of 'n' and ignore constant if any, you will get time complexity.

Best Case and Worst Case of an Algorithm: When are they considered to be the "same"?

I am trying to determine the Best Case and Worst Case of the algorithm below but I have been really confused since our professor claims that the Best Case and Worse Case of the algorithm are the "Same"
import math
def sum(Arr, left, right):
if(left>right):
return 0
elif(left == right):
return Arr[left]
else:
mid = left+ math.floor((right-left)/2)
leftSum = sum(Arr, left, mid)
rightSum = sum(Arr, mid+1, right)
return leftSum + rightSum;
def findSum(Arr):
return sum(Arr, 0, len(Arr)-1)
arr = [1,2,3,4,5,7,8,9]
print(findSum(arr));
The algorithm finds the sum of an array by divide-and-conquer, and here were my initial thought:
Best Case:
The input array only has 1 element, hence it would meet the second condition and return Arr[left]. No recursion is involved and takes Constant Time(O(1))
Worst Case:
The Input array could be arbitrary large(N elements), and each element needs to be scanned once at least, and it would take
Linear Time(O(N))
I saw an explanation from another post with similar questions, and the explanation was that no matter how large/small the input array is, the algorithm has to loop over it once, so it's Linear Time for both Worse/Best Case. But I was still confused, the Best Case of the algorithm does not involve any recursion, so can we still claim that it is Linear Time(O(N)) instead of Constant Time(O(1))?
Your understanding of time complexity is wrong and you are mixing it with best case worst case scenarios which makes it more confusing. An algorithm isn't considered constant when the input data is of length 1 and it does only one operation. The length of the input data has nothing to do with the best/worst case scenarios.
Time complexity relates the growth of the number of operations an algorithm does to its input data. So, we say an algorithm is constant when there is no relation between the input data and the number of operations. Or an algorithm is linear if the number of its operations grows linearly compared to the quantity of the input data.
The best/worst case scenarios are about the content of the data and how it's already structured. This is relevant in say, some sorting algorithms. Insertion sort for example, will only do linear work if the data is already sorted (best case) but quadratic work if it isn't. On the other hand, merge sort will do linearithmic work regardless of the sorting status of the input data.
I repeat, time complexity is the rate of growth of the number of operations an algorithm does in relation to the growth of its input data.
Now, to your example, does the data (not its quantity) have any impact on your algorithm? Does it matter for summing whether the numbers are say sorted or not? Or whether they are small or large (within reason of course)? It does not. The work is always the same. So the best and worst case become the same as your Professor says.

Time complexity of string permutation algorithm

I wrote a simple algorithm to return a list of all possible permutations of a string, as follows:
def get_permutations(sequence):
'''
Enumerate all permutations of a given string
sequence (string): an arbitrary string to permute. Assume that it is a
non-empty string.
Returns: a list of all permutations of sequence
'''
if len(sequence) <= 1:
return list(sequence)
else:
return_list = get_permutations(sequence[1:])
new_list = []
for e in return_list:
for pos in range(len(e) + 1):
new_list.append(e[:pos] + sequence[0] + e[pos:])
return new_list
From this code I'm seeing a time complexity of O(n* n!), O(n!) is the increasing tendency for the number of elements "e" in the "return_list", and there's a nested loop that increases linearly with each new recursion, so from my understanding, O(n). The conclusion is that the algorithm as a whole has O(n*n!) complexity.
However, when searching for similar solutions I found many threads saying the optimal case for this type of algorithm should be only O(n!), so my question is:
Am I missing something on my complexity analysis or is my code not optimal? And if it isn't, how can I properly correct it?
Take any algorithm that generates and then prints out all permutations of a sequence of n different elements. Then, since there are n! different permutations and each one has n elements, simply printing out all the permutations will take time ฮ˜(n ยท n!). That's worth keeping in mind as you evaluate the cost of generating permutations - even if you could generate all permutations in time O(n!), you couldn't then visit all those permutations without doing O(n ยท n!) work to view them all.
That being said - the recursive permutation-generating code you have up above does indeed run in time ฮ˜(n ยท n!). There are some other algorithms for generating permutations that can generate but not print the permutations in time ฮ˜(n!), but they work on different principles.
I have found, empirically, that unless you see a careful runtime analysis of a permutation-generating algorithm, you should be skeptical that the runtime is ฮ˜(n!). Most algorithms don't hit this runtime, and in the cases of the ones that do, the analysis is somewhat subtle. Stated differently - you're not missing anything; there's just lots of "on the right track but incorrect" claims made out there. :-)
I think your algorithm is O(n * n!) because to calculate a permutation of a string x of length n, your algorithm will use the permutations of a sub string of x, which is x without the first character. I'll call this sub string y. But to calculate the permutations of y, the permutations of a sub string of y will need to be calculated. This will continue until the sub string to have its permutations calculated is of length 1. This means that to calculate the permutations of x you will need to calculate the permutations of n - 1 other strings.
Here is an example. Let's say the input string was "pie". Then what your algorithm does is it takes "pie" and calls its self again with "ie", after which it calls itself with "e". Because "e" is of length 1 it returns and all the permutations for "ie" are found which are "ie" and ei". Then that function call will return the permutations of "ie" and it is only at this point that the permutations of "pie" are calculated which it does using the permutations of "ie".
I looked up a permutation generating algorithm called Heap's algorithm that has a time complexity of O(n!). The reason it has a time complexity of n! is because it generates permutations using swaps and each swap that it does on an array generates a unique permutation for the input string. Your algorithm however, generates permutations of the n-1 sub strings of the input string which is where the time complexity of O(n * n!) comes from.
I hope this helps and sorry if I'm being overly verbose.

Python - Sum of numbers

I am trying to sum all the numbers up to a range, with all the numbers up to the same range.
I am using python:
limit = 10
sums = []
for x in range(1,limit+1):
for y in range(1,limit+1):
sums.append(x+y)
This works just fine, however, because of the nested loops, if the limit is too big it will take a lot of time to compute the sums.
Is there any way of doing this without a nested loop?
(This is just a simplification of something that I need to do to solve a ProjectEuler problem. It involves obtaining the sum of all abundant numbers.)
[x + y for x in xrange(limit + 1) for y in xrange(x + 1)]
This still performs just as many calculations but will do it about twice as fast as a for loop.
from itertools import combinations
(a + b for a, b in combinations(xrange(n + 1, 2)))
This avoids a lot of duplicate sums. I don't know if you want to keep track of those or not.
If you just want every sum with no representation of how you got it then xrange(2*n + 2)
gives you what you want with no duplicates or looping at all.
In response to question:
[x + y for x in set set1 for y in set2]
I am trying to sum all the numbers up
to a range, with all the numbers up to
the same range.
So you want to compute limit**2 sums.
because of the nested loops, if the
limit is too big it will take a lot of
time to compute the sums.
Wrong: it's not "because of the nested loops" -- it's because you're computing a quadratic number of sums, and therefore doing a quadratic amount of work.
Is there any way of doing this without
a nested loop?
You can mask the nesting, as in #aaron's answer, and you can halve the number of sums you compute due to the problem's simmetry (though that doesn't do the same thing as your code), but, to prepare a list with a quadratic number of items, there's absolutely no way to avoid doing a quadratic amount of work.
However, for your stated purpose
obtaining the sum of all abundant
numbers.
you're need an infinite amount of work, since there's an infinity of abundant numbers;-).
I think you have in mind problem 23, which is actually very different: it asks for the sum of all numbers that cannot be expressed as the sum of two abundant numbers. How the summation you're asking about would help you move closer to that solution really escapes me.
I'm not sure if there is a good way not using nested loops.
If I put on your shoes, I'll write as following:
[x+y for x in range(1,limit+1) for y in range(1,limit+1)]

Categories