Most people with a degree in CS will certainly know what Big O stands for.
It helps us to measure how well an algorithm scales.
But I'm curious, how do you calculate or approximate the complexity of your algorithms?
I'll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.
There is no mechanical procedure that can be used to get the BigOh.
As a "cookbook", to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.
The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.
For example, let's say you have this piece of code:
int sum(int* data, int N) {
int result = 0; // 1
for (int i = 0; i < N; i++) { // 2
result += data[i]; // 3
}
return result; // 4
}
This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:
Number_Of_Steps = f(N)
So we have f(N), a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:
Number_Of_Steps = f(data.length)
The parameter N takes the data.length value. Now we need the actual definition of the function f(). This is done from the source code, in which each interesting line is numbered from 1 to 4.
There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn't depend on the size of the input data takes a constant C number computational steps.
We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the data array.
That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:
f(N) = C + ??? + C
The next part is to define the value of the for statement. Remember that we are counting the number of computational steps, meaning that the body of the for statement gets executed N times. That's the same as adding C, N times:
f(N) = C + (C + C + ... + C) + C = C + N * C + C
There is no mechanical rule to count how many times the body of the for gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the for statement.
To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:
Take away all the constants C.
From f() get the polynomium in its standard form.
Divide the terms of the polynomium and sort them by the rate of growth.
Keep the one that grows bigger when N approaches infinity.
Our f() has two terms:
f(N) = 2 * C * N ^ 0 + 1 * C * N ^ 1
Taking away all the C constants and redundant parts:
f(N) = 1 + N ^ 1
Since the last term is the one which grows bigger when f() approaches infinity (think on limits) this is the BigOh argument, and the sum() function has a BigOh of:
O(N)
There are a few tricks to solve some tricky ones: use summations whenever you can.
As an example, this code can be easily solved using summations:
for (i = 0; i < 2*n; i += 2) { // 1
for (j=n; j > i; j--) { // 2
foo(); // 3
}
}
The first thing you needed to be asked is the order of execution of foo(). While the usual is to be O(1), you need to ask your professors about it. O(1) means (almost, mostly) constant C, independent of the size N.
The for statement on the sentence number one is tricky. While the index ends at 2 * N, the increment is done by two. That means that the first for gets executed only N steps, and we need to divide the count by two.
f(N) = Summation(i from 1 to 2 * N / 2)( ... ) =
= Summation(i from 1 to N)( ... )
The sentence number two is even trickier since it depends on the value of i. Take a look: the index i takes the values: 0, 2, 4, 6, 8, ..., 2 * N, and the second for get executed: N times the first one, N - 2 the second, N - 4 the third... up to the N / 2 stage, on which the second for never gets executed.
On formula, that means:
f(N) = Summation(i from 1 to N)( Summation(j = ???)( ) )
Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.
f(N) = Summation(i from 1 to N)( Summation(j = 1 to (N - (i - 1) * 2)( C ) )
(We are assuming that foo() is O(1) and takes C steps.)
We have a problem here: when i takes the value N / 2 + 1 upwards, the inner Summation ends at a negative number! That's impossible and wrong. We need to split the summation in two, being the pivotal point the moment i takes N / 2 + 1.
f(N) = Summation(i from 1 to N / 2)( Summation(j = 1 to (N - (i - 1) * 2)) * ( C ) ) + Summation(i from 1 to N / 2) * ( C )
Since the pivotal moment i > N / 2, the inner for won't get executed, and we are assuming a constant C execution complexity on its body.
Now the summations can be simplified using some identity rules:
Summation(w from 1 to N)( C ) = N * C
Summation(w from 1 to N)( A (+/-) B ) = Summation(w from 1 to N)( A ) (+/-) Summation(w from 1 to N)( B )
Summation(w from 1 to N)( w * C ) = C * Summation(w from 1 to N)( w ) (C is a constant, independent of w)
Summation(w from 1 to N)( w ) = (N * (N + 1)) / 2
Applying some algebra:
f(N) = Summation(i from 1 to N / 2)( (N - (i - 1) * 2) * ( C ) ) + (N / 2)( C )
f(N) = C * Summation(i from 1 to N / 2)( (N - (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (Summation(i from 1 to N / 2)( N ) - Summation(i from 1 to N / 2)( (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2)( i - 1 )) + (N / 2)( C )
=> Summation(i from 1 to N / 2)( i - 1 ) = Summation(i from 1 to N / 2 - 1)( i )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2 - 1)( i )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N / 2 - 1) * (N / 2 - 1 + 1) / 2) ) + (N / 2)( C )
=> (N / 2 - 1) * (N / 2 - 1 + 1) / 2 =
(N / 2 - 1) * (N / 2) / 2 =
((N ^ 2 / 4) - (N / 2)) / 2 =
(N ^ 2 / 8) - (N / 4)
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N ^ 2 / 8) - (N / 4) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - ( (N ^ 2 / 4) - (N / 2) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - (N ^ 2 / 4) + (N / 2)) + (N / 2)( C )
f(N) = C * ( N ^ 2 / 4 ) + C * (N / 2) + C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + 2 * C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + C * N
f(N) = C * 1/4 * N ^ 2 + C * N
And the BigOh is:
O(N²)
Big O gives the upper bound for time complexity of an algorithm. It is usually used in conjunction with processing data sets (lists) but can be used elsewhere.
A few examples of how it's used in C code.
Say we have an array of n elements
int array[n];
If we wanted to access the first element of the array this would be O(1) since it doesn't matter how big the array is, it always takes the same constant time to get the first item.
x = array[0];
If we wanted to find a number in the list:
for(int i = 0; i < n; i++){
if(array[i] == numToFind){ return i; }
}
This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm (omega is for lower bound and theta is for tight bound).
When we get to nested loops:
for(int i = 0; i < n; i++){
for(int j = i; j < n; j++){
array[j] += 2;
}
}
This is O(n^2) since for each pass of the outer loop ( O(n) ) we have to go through the entire list again so the n's multiply leaving us with n squared.
This is barely scratching the surface but when you get to analyzing more complex algorithms complex math involving proofs comes into play. Hope this familiarizes you with the basics at least though.
While knowing how to figure out the Big O time for your particular problem is useful, knowing some general cases can go a long way in helping you make decisions in your algorithm.
Here are some of the most common cases, lifted from http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions:
O(1) - Determining if a number is even or odd; using a constant-size lookup table or hash table
O(logn) - Finding an item in a sorted array with a binary search
O(n) - Finding an item in an unsorted list; adding two n-digit numbers
O(n2) - Multiplying two n-digit numbers by a simple algorithm; adding two n×n matrices; bubble sort or insertion sort
O(n3) - Multiplying two n×n matrices by simple algorithm
O(cn) - Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force
O(n!) - Solving the traveling salesman problem via brute-force search
O(nn) - Often used instead of O(n!) to derive simpler formulas for asymptotic complexity
Small reminder: the big O notation is used to denote asymptotic complexity (that is, when the size of the problem grows to infinity), and it hides a constant.
This means that between an algorithm in O(n) and one in O(n2), the fastest is not always the first one (though there always exists a value of n such that for problems of size >n, the first algorithm is the fastest).
Note that the hidden constant very much depends on the implementation!
Also, in some cases, the runtime is not a deterministic function of the size n of the input. Take sorting using quick sort for example: the time needed to sort an array of n elements is not a constant but depends on the starting configuration of the array.
There are different time complexities:
Worst case (usually the simplest to figure out, though not always very meaningful)
Average case (usually much harder to figure out...)
...
A good introduction is An Introduction to the Analysis of Algorithms by R. Sedgewick and P. Flajolet.
As you say, premature optimisation is the root of all evil, and (if possible) profiling really should always be used when optimising code. It can even help you determine the complexity of your algorithms.
Seeing the answers here I think we can conclude that most of us do indeed approximate the order of the algorithm by looking at it and use common sense instead of calculating it with, for example, the master method as we were thought at university.
With that said I must add that even the professor encouraged us (later on) to actually think about it instead of just calculating it.
Also I would like to add how it is done for recursive functions:
suppose we have a function like (scheme code):
(define (fac n)
(if (= n 0)
1
(* n (fac (- n 1)))))
which recursively calculates the factorial of the given number.
The first step is to try and determine the performance characteristic for the body of the function only in this case, nothing special is done in the body, just a multiplication (or the return of the value 1).
So the performance for the body is: O(1) (constant).
Next try and determine this for the number of recursive calls. In this case we have n-1 recursive calls.
So the performance for the recursive calls is: O(n-1) (order is n, as we throw away the insignificant parts).
Then put those two together and you then have the performance for the whole recursive function:
1 * (n-1) = O(n)
Peter, to answer your raised issues; the method I describe here actually handles this quite well. But keep in mind that this is still an approximation and not a full mathematically correct answer. The method described here is also one of the methods we were taught at university, and if I remember correctly was used for far more advanced algorithms than the factorial I used in this example.
Of course it all depends on how well you can estimate the running time of the body of the function and the number of recursive calls, but that is just as true for the other methods.
If your cost is a polynomial, just keep the highest-order term, without its multiplier. E.g.:
O((n/2 + 1)*(n/2)) = O(n2/4 + n/2) = O(n2/4) = O(n2)
This doesn't work for infinite series, mind you. There is no single recipe for the general case, though for some common cases, the following inequalities apply:
O(log N) < O(N) < O(N log N) < O(N2) < O(Nk) < O(en) < O(n!)
I think about it in terms of information. Any problem consists of learning a certain number of bits.
Your basic tool is the concept of decision points and their entropy. The entropy of a decision point is the average information it will give you. For example, if a program contains a decision point with two branches, it's entropy is the sum of the probability of each branch times the log2 of the inverse probability of that branch. That's how much you learn by executing that decision.
For example, an if statement having two branches, both equally likely, has an entropy of 1/2 * log(2/1) + 1/2 * log(2/1) = 1/2 * 1 + 1/2 * 1 = 1. So its entropy is 1 bit.
Suppose you are searching a table of N items, like N=1024. That is a 10-bit problem because log(1024) = 10 bits. So if you can search it with IF statements that have equally likely outcomes, it should take 10 decisions.
That's what you get with binary search.
Suppose you are doing linear search. You look at the first element and ask if it's the one you want. The probabilities are 1/1024 that it is, and 1023/1024 that it isn't. The entropy of that decision is 1/1024*log(1024/1) + 1023/1024 * log(1024/1023) = 1/1024 * 10 + 1023/1024 * about 0 = about .01 bit. You've learned very little! The second decision isn't much better. That is why linear search is so slow. In fact it's exponential in the number of bits you need to learn.
Suppose you are doing indexing. Suppose the table is pre-sorted into a lot of bins, and you use some of all of the bits in the key to index directly to the table entry. If there are 1024 bins, the entropy is 1/1024 * log(1024) + 1/1024 * log(1024) + ... for all 1024 possible outcomes. This is 1/1024 * 10 times 1024 outcomes, or 10 bits of entropy for that one indexing operation. That is why indexing search is fast.
Now think about sorting. You have N items, and you have a list. For each item, you have to search for where the item goes in the list, and then add it to the list. So sorting takes roughly N times the number of steps of the underlying search.
So sorts based on binary decisions having roughly equally likely outcomes all take about O(N log N) steps. An O(N) sort algorithm is possible if it is based on indexing search.
I've found that nearly all algorithmic performance issues can be looked at in this way.
Lets start from the beginning.
First of all, accept the principle that certain simple operations on data can be done in O(1) time, that is, in time that is independent of the size of the input. These primitive operations in C consist of
Arithmetic operations (e.g. + or %).
Logical operations (e.g., &&).
Comparison operations (e.g., <=).
Structure accessing operations (e.g. array-indexing like A[i], or pointer fol-
lowing with the -> operator).
Simple assignment such as copying a value into a variable.
Calls to library functions (e.g., scanf, printf).
The justification for this principle requires a detailed study of the machine instructions (primitive steps) of a typical computer. Each of the described operations can be done with some small number of machine instructions; often only one or two instructions are needed.
As a consequence, several kinds of statements in C can be executed in O(1) time, that is, in some constant amount of time independent of input. These simple include
Assignment statements that do not involve function calls in their expressions.
Read statements.
Write statements that do not require function calls to evaluate arguments.
The jump statements break, continue, goto, and return expression, where
expression does not contain a function call.
In C, many for-loops are formed by initializing an index variable to some value and
incrementing that variable by 1 each time around the loop. The for-loop ends when
the index reaches some limit. For instance, the for-loop
for (i = 0; i < n-1; i++)
{
small = i;
for (j = i+1; j < n; j++)
if (A[j] < A[small])
small = j;
temp = A[small];
A[small] = A[i];
A[i] = temp;
}
uses index variable i. It increments i by 1 each time around the loop, and the iterations
stop when i reaches n − 1.
However, for the moment, focus on the simple form of for-loop, where the difference between the final and initial values, divided by the amount by which the index variable is incremented tells us how many times we go around the loop. That count is exact, unless there are ways to exit the loop via a jump statement; it is an upper bound on the number of iterations in any case.
For instance, the for-loop iterates ((n − 1) − 0)/1 = n − 1 times,
since 0 is the initial value of i, n − 1 is the highest value reached by i (i.e., when i
reaches n−1, the loop stops and no iteration occurs with i = n−1), and 1 is added
to i at each iteration of the loop.
In the simplest case, where the time spent in the loop body is the same for each
iteration, we can multiply the big-oh upper bound for the body by the number of
times around the loop. Strictly speaking, we must then add O(1) time to initialize
the loop index and O(1) time for the first comparison of the loop index with the
limit, because we test one more time than we go around the loop. However, unless
it is possible to execute the loop zero times, the time to initialize the loop and test
the limit once is a low-order term that can be dropped by the summation rule.
Now consider this example:
(1) for (j = 0; j < n; j++)
(2) A[i][j] = 0;
We know that line (1) takes O(1) time. Clearly, we go around the loop n times, as
we can determine by subtracting the lower limit from the upper limit found on line
(1) and then adding 1. Since the body, line (2), takes O(1) time, we can neglect the
time to increment j and the time to compare j with n, both of which are also O(1).
Thus, the running time of lines (1) and (2) is the product of n and O(1), which is O(n).
Similarly, we can bound the running time of the outer loop consisting of lines
(2) through (4), which is
(2) for (i = 0; i < n; i++)
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
We have already established that the loop of lines (3) and (4) takes O(n) time.
Thus, we can neglect the O(1) time to increment i and to test whether i < n in
each iteration, concluding that each iteration of the outer loop takes O(n) time.
The initialization i = 0 of the outer loop and the (n + 1)st test of the condition
i < n likewise take O(1) time and can be neglected. Finally, we observe that we go
around the outer loop n times, taking O(n) time for each iteration, giving a total
O(n^2) running time.
A more practical example.
If you want to estimate the order of your code empirically rather than by analyzing the code, you could stick in a series of increasing values of n and time your code. Plot your timings on a log scale. If the code is O(x^n), the values should fall on a line of slope n.
This has several advantages over just studying the code. For one thing, you can see whether you're in the range where the run time approaches its asymptotic order. Also, you may find that some code that you thought was order O(x) is really order O(x^2), for example, because of time spent in library calls.
Basically the thing that crops up 90% of the time is just analyzing loops. Do you have single, double, triple nested loops? The you have O(n), O(n^2), O(n^3) running time.
Very rarely (unless you are writing a platform with an extensive base library (like for instance, the .NET BCL, or C++'s STL) you will encounter anything that is more difficult than just looking at your loops (for statements, while, goto, etc...)
Less useful generally, I think, but for the sake of completeness there is also a Big Omega Ω, which defines a lower-bound on an algorithm's complexity, and a Big Theta Θ, which defines both an upper and lower bound.
Big O notation is useful because it's easy to work with and hides unnecessary complications and details (for some definition of unnecessary). One nice way of working out the complexity of divide and conquer algorithms is the tree method. Let's say you have a version of quicksort with the median procedure, so you split the array into perfectly balanced subarrays every time.
Now build a tree corresponding to all the arrays you work with. At the root you have the original array, the root has two children which are the subarrays. Repeat this until you have single element arrays at the bottom.
Since we can find the median in O(n) time and split the array in two parts in O(n) time, the work done at each node is O(k) where k is the size of the array. Each level of the tree contains (at most) the entire array so the work per level is O(n) (the sizes of the subarrays add up to n, and since we have O(k) per level we can add this up). There are only log(n) levels in the tree since each time we halve the input.
Therefore we can upper bound the amount of work by O(n*log(n)).
However, Big O hides some details which we sometimes can't ignore. Consider computing the Fibonacci sequence with
a=0;
b=1;
for (i = 0; i <n; i++) {
tmp = b;
b = a + b;
a = tmp;
}
and lets just assume the a and b are BigIntegers in Java or something that can handle arbitrarily large numbers. Most people would say this is an O(n) algorithm without flinching. The reasoning is that you have n iterations in the for loop and O(1) work in side the loop.
But Fibonacci numbers are large, the n-th Fibonacci number is exponential in n so just storing it will take on the order of n bytes. Performing addition with big integers will take O(n) amount of work. So the total amount of work done in this procedure is
1 + 2 + 3 + ... + n = n(n-1)/2 = O(n^2)
So this algorithm runs in quadradic time!
Familiarity with the algorithms/data structures I use and/or quick glance analysis of iteration nesting. The difficulty is when you call a library function, possibly multiple times - you can often be unsure of whether you are calling the function unnecessarily at times or what implementation they are using. Maybe library functions should have a complexity/efficiency measure, whether that be Big O or some other metric, that is available in documentation or even IntelliSense.
Break down the algorithm into pieces you know the big O notation for, and combine through big O operators. That's the only way I know of.
For more information, check the Wikipedia page on the subject.
As to "how do you calculate" Big O, this is part of Computational complexity theory. For some (many) special cases you may be able to come with some simple heuristics (like multiplying loop counts for nested loops), esp. when all you want is any upper bound estimation, and you do not mind if it is too pessimistic - which I guess is probably what your question is about.
If you really want to answer your question for any algorithm the best you can do is to apply the theory. Besides of simplistic "worst case" analysis I have found Amortized analysis very useful in practice.
For the 1st case, the inner loop is executed n-i times, so the total number of executions is the sum for i going from 0 to n-1 (because lower than, not lower than or equal) of the n-i. You get finally n*(n + 1) / 2, so O(n²/2) = O(n²).
For the 2nd loop, i is between 0 and n included for the outer loop; then the inner loop is executed when j is strictly greater than n, which is then impossible.
I would like to explain the Big-O in a little bit different aspect.
Big-O is just to compare the complexity of the programs which means how fast are they growing when the inputs are increasing and not the exact time which is spend to do the action.
IMHO in the big-O formulas you better not to use more complex equations (you might just stick to the ones in the following graph.) However you still might use other more precise formula (like 3^n, n^3, ...) but more than that can be sometimes misleading! So better to keep it as simple as possible.
I would like to emphasize once again that here we don't want to get an exact formula for our algorithm. We only want to show how it grows when the inputs are growing and compare with the other algorithms in that sense. Otherwise you would better use different methods like bench-marking.
In addition to using the master method (or one of its specializations), I test my algorithms experimentally. This can't prove that any particular complexity class is achieved, but it can provide reassurance that the mathematical analysis is appropriate. To help with this reassurance, I use code coverage tools in conjunction with my experiments, to ensure that I'm exercising all the cases.
As a very simple example say you wanted to do a sanity check on the speed of the .NET framework's list sort. You could write something like the following, then analyze the results in Excel to make sure they did not exceed an n*log(n) curve.
In this example I measure the number of comparisons, but it's also prudent to examine the actual time required for each sample size. However then you must be even more careful that you are just measuring the algorithm and not including artifacts from your test infrastructure.
int nCmp = 0;
System.Random rnd = new System.Random();
// measure the time required to sort a list of n integers
void DoTest(int n)
{
List<int> lst = new List<int>(n);
for( int i=0; i<n; i++ )
lst[i] = rnd.Next(0,1000);
// as we sort, keep track of the number of comparisons performed!
nCmp = 0;
lst.Sort( delegate( int a, int b ) { nCmp++; return (a<b)?-1:((a>b)?1:0)); }
System.Console.Writeline( "{0},{1}", n, nCmp );
}
// Perform measurement for a variety of sample sizes.
// It would be prudent to check multiple random samples of each size, but this is OK for a quick sanity check
for( int n = 0; n<1000; n++ )
DoTest(n);
Don't forget to also allow for space complexities that can also be a cause for concern if one has limited memory resources. So for example you may hear someone wanting a constant space algorithm which is basically a way of saying that the amount of space taken by the algorithm doesn't depend on any factors inside the code.
Sometimes the complexity can come from how many times is something called, how often is a loop executed, how often is memory allocated, and so on is another part to answer this question.
Lastly, big O can be used for worst case, best case, and amortization cases where generally it is the worst case that is used for describing how bad an algorithm may be.
First of all, the accepted answer is trying to explain nice fancy stuff,
but I think, intentionally complicating Big-Oh is not the solution,
which programmers (or at least, people like me) search for.
Big Oh (in short)
function f(text) {
var n = text.length;
for (var i = 0; i < n; i++) {
f(text.slice(0, n-1))
}
// ... other JS logic here, which we can ignore ...
}
Big Oh of above is f(n) = O(n!) where n represents number of items in input set,
and f represents operation done per item.
Big-Oh notation is the asymptotic upper-bound of the complexity of an algorithm.
In programming: The assumed worst-case time taken,
or assumed maximum repeat count of logic, for size of the input.
Calculation
Keep in mind (from above meaning) that; We just need worst-case time and/or maximum repeat count affected by N (size of input),
Then take another look at (accepted answer's) example:
for (i = 0; i < 2*n; i += 2) { // line 123
for (j=n; j > i; j--) { // line 124
foo(); // line 125
}
}
Begin with this search-pattern:
Find first line that N caused repeat behavior,
Or caused increase of logic executed,
But constant or not, ignore anything before that line.
Seems line hundred-twenty-three is what we are searching ;-)
On first sight, line seems to have 2*n max-looping.
But looking again, we see i += 2 (and that half is skipped).
So, max repeat is simply n, write it down, like f(n) = O( n but don't close parenthesis yet.
Repeat search till method's end, and find next line matching our search-pattern, here that's line 124
Which is tricky, because strange condition, and reverse looping.
But after remembering that we just need to consider maximum repeat count (or worst-case time taken).
It's as easy as saying "Reverse-Loop j starts with j=n, am I right? yes, n seems to be maximum possible repeat count", so:
Add n to previous write down's end,
but like "( n " instead of "+ n" (as this is inside previous loop),
and close parenthesis only if we find something outside of previous loop.
Search Done! why? because line 125 (or any other line after) does not match our search-pattern.
We can now close any parenthesis (left-open in our write down), resulting in below:
f(n) = O( n( n ) )
Try to further shorten "n( n )" part, like:
n( n ) = n * n
= n2
Finally, just wrap it with Big Oh notation, like O(n2) or O(n^2) without formatting.
What often gets overlooked is the expected behavior of your algorithms. It doesn't change the Big-O of your algorithm, but it does relate to the statement "premature optimization. . .."
Expected behavior of your algorithm is -- very dumbed down -- how fast you can expect your algorithm to work on data you're most likely to see.
For instance, if you're searching for a value in a list, it's O(n), but if you know that most lists you see have your value up front, typical behavior of your algorithm is faster.
To really nail it down, you need to be able to describe the probability distribution of your "input space" (if you need to sort a list, how often is that list already going to be sorted? how often is it totally reversed? how often is it mostly sorted?) It's not always feasible that you know that, but sometimes you do.
great question!
Disclaimer: this answer contains false statements see the comments below.
If you're using the Big O, you're talking about the worse case (more on what that means later). Additionally, there is capital theta for average case and a big omega for best case.
Check out this site for a lovely formal definition of Big O: https://xlinux.nist.gov/dads/HTML/bigOnotation.html
f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
Ok, so now what do we mean by "best-case" and "worst-case" complexities?
This is probably most clearly illustrated through examples. For example if we are using linear search to find a number in a sorted array then the worst case is when we decide to search for the last element of the array as this would take as many steps as there are items in the array. The best case would be when we search for the first element since we would be done after the first check.
The point of all these adjective-case complexities is that we're looking for a way to graph the amount of time a hypothetical program runs to completion in terms of the size of particular variables. However for many algorithms you can argue that there is not a single time for a particular size of input. Notice that this contradicts with the fundamental requirement of a function, any input should have no more than one output. So we come up with multiple functions to describe an algorithm's complexity. Now, even though searching an array of size n may take varying amounts of time depending on what you're looking for in the array and depending proportionally to n, we can create an informative description of the algorithm using best-case, average-case, and worst-case classes.
Sorry this is so poorly written and lacks much technical information. But hopefully it'll make time complexity classes easier to think about. Once you become comfortable with these it becomes a simple matter of parsing through your program and looking for things like for-loops that depend on array sizes and reasoning based on your data structures what kind of input would result in trivial cases and what input would result in worst-cases.
I don't know how to programmatically solve this, but the first thing people do is that we sample the algorithm for certain patterns in the number of operations done, say 4n^2 + 2n + 1 we have 2 rules:
If we have a sum of terms, the term with the largest growth rate is kept, with other terms omitted.
If we have a product of several factors constant factors are omitted.
If we simplify f(x), where f(x) is the formula for number of operations done, (4n^2 + 2n + 1 explained above), we obtain the big-O value [O(n^2) in this case]. But this would have to account for Lagrange interpolation in the program, which may be hard to implement. And what if the real big-O value was O(2^n), and we might have something like O(x^n), so this algorithm probably wouldn't be programmable. But if someone proves me wrong, give me the code . . . .
For code A, the outer loop will execute for n+1 times, the '1' time means the process which checks the whether i still meets the requirement. And inner loop runs n times, n-2 times.... Thus,0+2+..+(n-2)+n= (0+n)(n+1)/2= O(n²).
For code B, though inner loop wouldn't step in and execute the foo(), the inner loop will be executed for n times depend on outer loop execution time, which is O(n)
I am trying to learn functional programming and algorithms at the same time, and Ive implemented a merge sort in Haskell. Then I converted the style into python and run a test on a learning platform, but I get return that it takes too long time to sort a list on a 1000 integers.
Is there a way i can optimize my python code and still keep my functional style or do I have to solve the problem iteratively?
Thanks in advance.
So here is the code I made in Haskell first.
merge :: Ord a => [a] -> [a] -> [a]
merge [] xs = xs
merge ys [] = ys
merge (x:xs) (y:ys)
| (x <= y) = x : (merge xs (y:ys))
| otherwise = y : (merge (x:xs) ys)
halve :: [a] -> ([a] , [a])
halve [x] = ([x], [])
halve xs = (take n xs , drop n xs)
where n = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [x] = [x]
msort [] = []
msort xs = merge (msort n) (msort m)
where (n,m) = halve xs
Then I made this code in python based on the Haskell style.
import sys
sys.setrecursionlimit(1002) #This is because the recursion will go 1002 times deep when I have a list on 1000 numbers.
def merge(xs,ys):
if len(xs) == 0:
return ys
elif len(ys) == 0:
return xs
else:
if xs[0] <= ys[0]:
return [xs[0]] + merge(xs[1:], ys)
else:
return [ys[0]] + merge(xs, ys[1:])
def halve(xs):
return (xs[:len(xs)//2],xs[len(xs)//2:])
def msort(xss):
if len(xss) <= 1:
return xss
else:
xs,ys = halve(xss)
return merge(msort(xs), msort(ys))
Is there a smarter way I can optimize the python version and still have a functional style?
Haskell lists are lazy. [x] ++ xs first produces the x, and then it produces all the elements in xs.
In e.g. Lisp the lists are singly-linked lists and appending them copies the first list, so prepending a singleton is an O(1) operation.
In Python though the appending copies the second list (as confirmed by #chepner in the comments), i.e. [x] + xs will copy the whole list xs and thus is an O(n) operation (where n is the length of xs).
This means that both your [xs[0]] + merge(xs[1:], ys) and [ys[0]] + merge(xs, ys[1:]) lead to quadratic behavior which you observe as the dramatic slowdown you describe.
Python's equivalent to Haskell's lazy lists is not lists, it's generators, which produce their elements one by one on each yield. Thus the rewrite could look something like
def merge(xs,ys):
if len(xs) == 0:
return ys
elif len(ys) == 0:
return xs
else:
a = (x for x in xs) # or maybe iter(xs)
b = (y for y in ys) # or maybe iter(ys)
list( merge_gen(a,b))
Now what's left is to re-implement your merge logic as merge_gen which expects two generators (or should that be iterators? do find out) as its input and generates the ordered stream of elements which it gets by pulling them one by one from the two sources as needed. The resulting stream of elements is converted back to list, as expected by the function's caller. No redundant copying will be performed.
If I've made some obvious Python errors, please treat the above as a pseudocode.
Your other option is to pre-allocate a second list of the same length and copy the elements between the two lists back and forth while merging, using indices to reference the elements of the arrays and mutating the contents to store the results.
I am no Haskell expert, so I might be missing something. Here's my best gamble:
Haskell list's are not state-aware. One implication of that is that lists can be shared. That make the action of halving leaner on memory allocations - To produce a 'drop n xs' you only have to allocate one list node (or whatever they are called in Haskell) and point it to the list element in the (n 'div' 2) + 1 node on the pre-halved list.
Note that 'take' is not able to do this little trick - it is not allowed to change a state of any node in the list, and hence it has to allocate new node object with equal values to the first n div 2 elements in the pre-halved list.
Now look at the python equivalent of that function - to halve the list, you use the list slicing:
def halve(xs):
return (xs[:len(xs)//2],xs[len(xs)//2:])
Here you allocate two lists instead of one - in every level of the recursion tree! (I am also pretty sure that a list is a much more complex thing in python than the Haskell list, so probably allocation is slower, too)
What I would do:
Check my gamble - use the time module to see if your code spends too long allocating those lists, compared to the overall running time.
In case my gamble proved correct - avoid those allocations. A (Not very elegant, but probably fast) way to work around it - Pass a list, alongside with indices that indicate where each halve begin and where it ends. Work with offsets instead of allocating a new list each time. (EDIT:) You can avoid similar allocations as well - whenever you want to slice, pass an index to the begin\end of the new list.
And a last word - one of the requirements you've mentioned is keeping the functional approach. One can interpret that as keeping your code side-effect free.
To do so, I'd define an output list, and store the elements you merge in it. Combined with the index approach, that will not change the state of the input list, and will produce a new sorted output list.
EDIT:
Another thing worth mentioning here: python lists are not singly linked-list, like Haskell lists. They are a data structure more commonly called Dynamic Arrays. This means that stuff like slicing, deleting an object from the middle of the list, etc. is expensive, since it has implication on ALL objects in the array. On the other hand, you are allowed to access an object at the i-th index in O(1). You should keep that in mind, it is closely related to the problem you came Up with.
It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:
1_000_000_000_000_000 in range(1_000_000_000_000_001)
Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).
I have also tried things like this, but the calculation is still almost instant:
# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)
If I try to implement my own range function, the result is not so nice!
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
What is the range() object doing under the hood that makes it so fast?
Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.
The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.
The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.
From the range() object documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).
So at a minimum, your range() object would do:
class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1
def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step
def __len__(self):
return self.length
def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')
def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.
I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.
* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.
The fundamental misunderstanding here is in thinking that range is a generator. It's not. In fact, it's not any kind of iterator.
You can tell this pretty easily:
>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]
If it were a generator, iterating it once would exhaust it:
>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]
What range actually is, is a sequence, just like a list. You can even test this:
>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True
This means it has to follow all the rules of being a sequence:
>>> a[3] # indexable
3
>>> len(a) # sized
5
>>> 3 in a # membership
True
>>> reversed(a) # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3) # implements 'index'
3
>>> a.count(3) # implements 'count'
1
The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.
(As a side note, if you print(iter(a)), you'll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn't use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)
Now, there's nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn't. But there's nothing that says it can't be. And it's easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?
But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact, CPython 3.0-3.1 didn't include it.)
If range actually were a generator, like my_crappy_range, then it wouldn't make sense to test __contains__ this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?
Use the source, Luke!
In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:
Check that the number is between start and stop, and
Check that the stride value doesn't "step over" our number.
For example, 994 is in range(4, 1000, 2) because:
4 <= 994 < 1000, and
(994 - 4) % 2 == 0.
The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:
static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;
zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;
/* Check if the value can possibly be in the range. */
cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}
if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}
/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
The "meat" of the idea is mentioned in the comment lines:
/* positive steps: start <= ob < stop */
/* negative steps: stop < ob <= start */
/* result = ((int(ob) - start) % step) == 0 */
As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I'm using v3.5.0 here):
>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
... pass
...
>>> x_ = MyInt(x)
>>> x in r # calculates immediately :)
True
>>> x_ in r # iterates for ages.. :(
^\Quit (core dumped)
To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).
If it’s not an int object, it falls back to iterating until it finds the value (or not).
The whole logic could be translated to pseudo-Python like this:
def range_contains (rangeObj, obj):
if isinstance(obj, int):
return range_contains_long(rangeObj, obj)
# default logic by iterating
return any(obj == x for x in rangeObj)
def range_contains_long (r, num):
if r.step > 0:
# positive step: r.start <= num < r.stop
cmp2 = r.start <= num
cmp3 = num < r.stop
else:
# negative step: r.start >= num > r.stop
cmp2 = num <= r.start
cmp3 = r.stop < num
# outside of the range boundaries
if not cmp2 or not cmp3:
return False
# num must be on a valid step inside the boundaries
return (num - r.start) % r.step == 0
If you're wondering why this optimization was added to range.__contains__, and why it wasn't added to xrange.__contains__ in 2.7:
First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)
Meanwhile:
Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:
Range objects have very little behavior: they only support indexing, iteration, and the len function.
This wasn't quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.
Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2's range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)
So, there were two chances to get this optimization backported to 2.7, but they were both rejected.
* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.
** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn't apply.
The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:
>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))
0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]
As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.
It's all about a lazy approach to the evaluation and some extra optimization of range.
Values in ranges don't need to be computed until real use, or even further due to extra optimization.
By the way, your integer is not such big, consider sys.maxsize
sys.maxsize in range(sys.maxsize) is pretty fast
due to optimization - it's easy to compare given integer just with min and max of range.
but:
Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.
(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)
You should be aware of an implementation detail but should not be relied upon, because this may change in the future.
TL;DR
The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.
But it also implements the __contains__ interface which is actually what gets called when an object appears on the right-hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).
Due to optimization, it is very easy to compare given integers just with min and max range.
The reason that the range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
So for explaining the logic here:
Check whether the number is between the start and stop.
Check whether the step precision value doesn't go over our number.
Take an example, 997 is in range(4, 1000, 3) because:
4 <= 997 < 1000, and (997 - 4) % 3 == 0.
Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.
TLDR;
the range is an arithmetic series so it can very easily calculate whether the object is there. It could even get the index of it if it were list like really quickly.
__contains__ method compares directly with the start and end of the range