What is the runtime complexity (big O) of the following pseudocode? - python

I recently had a very, very intense debate about the runtime complexity of a super simple algorithm with a colleague of mine. In the end we both agreed to disagree but as I've been thinking about this, it's challenged my basic understanding of computer science fundamentals and so I therefore must get additional insight on the matter.
Given the following python, what is the Big-O runtime complexity:
for c in "How are you today?":
print c
Now, I immediately called out that this is simply on the order of O(n) aka linear. Meaning it's dependent on the length of the string so therefore this loop will grow linearly as the length of the string grows.
My colleague then said, "No, it's constant because we know that for the set of all strings we are dealing with (in our case), the max string is always 255 characters long (in our case), therefore it must be constant." He followed on by saying "because we have a max upper-bound on character length of the string this results in O(255) which reduces to O(1)."
Anyways, we went back and fourth and after 45 minutes of both of us drawing sketches we both dead-locked on the issue.
My question is in what world or what math system is the loop above a constant time loop? If we knew our upper-bound was say 1,000,000 characters and the set of all strings could be anywhere from 0 to 1,000,000 this loop will obviously exhibit linear running times depending on the size of the string.
I additionally asked him if he also thinks the following code is O(1) if the upper-bound size of n is known. Meaning we are certain this code will only ever operate on a max upper-bound of say 255 characters:
s = "How are you today?"
for c in s:
for d in s:
print c+d
He said this is also constant time....even after I explained this is an O(n^2) algorithm and demonstrated that the following code would produce a quadratic curve.
So, am I missing some theoretical concept where any of the above is true depending on how the theory goes? Just to be clear his understanding is that I am correct if n is not known. If the upper-bound of n is always known he is asserting that the two algorithms on this post are both of constant runtime complexity.
Just looking to maintain my sanity, but perhaps if I'm wrong there's certainly some additional learning I can benefit from. My good, good colleague was very convincing. Also, if anybody has additional links or material on the subject specific to this question please add to the comments.

Applying Big-O notation to a single scenario in which all the inputs are known is ludicrous. There is no Big-O for a single case.
The whole point is to get a worst-case estimate for arbitrarily large, unknown values of n. If you already know the exact answer, why on Earth would you waste time trying to estimate it?
Mathy / Computer-Sciencey Edit:
Big-O notation is defined as n grows arbitrarily large: f(n) is O(g(n)) if g(n) ≥ c * f(n), for any constant c, for all n greater than some nMin. Meaning, your "opponent" can set c to "eleventy-quadjillion" and it doesn't matter, because, for all points "to the right" of some point nMin, the graph of "eleventy-quadjillion times f(n)" will lag below g(n)... forever.
Example: 2n is less than or equal to n2... for a short segment of the x-axis that includes n = 2, 3, and 4 (at n = 3, 2n is 8, while n2 is 9). This doesn't change the fact that their Big-O relationship is the opposite: O(2n) is much greater than O(n2), because Big-O says nothing about n values less than nMin. If you set nMin to 4 (thus ignoring the graph to the left of 4), you'll see that the n2 line never exceeds the 2n line.
If your "opponent" multiplies n2 by some larger constant c to raise "his" n2 line above your 2n line, you haven't lost yet... you just slide nMin to the right a bit. Big-O says that no matter how big he makes c, you can always find a point after which his equation loses and yours wins, forever.
But, if you constrain n on the right, you've violated the prerequisites for any kind of Big-O analysis. In your argument with your co-worker, one of you invented an nMax, and then the other set nMin somewhere to the right of it --- surprise, the results are nonsensical.
For instance, the first algorithm you showed does indeed do about n work for inputs of length n... in the general case. If I were building my own algorithm that called it n times, I would have to consider mine a quadratic O(n2) algorithm... again, in the general case.
But if I could prove that I would never call your algorithm with an input greater than say 10 (meaning I had more information, and could thus estimate my algorithm more precisely), using Big-O to estimate your algorithm's performance would be throwing away what I'd learned about its actual behavior in the case I care about. I should instead replace your algorithm with a suitably large constant --- changing my algorithm from c * n2 to c * 10 * n... which is just cBigger * n. I could honestly claim my algorithm is linear, because in this case, your algorithm's graph will never rise above that constant value. This would change nothing about the Big-O performance of your algorithm, because Big-O is not defined for constrained cases like this.
To wrap up: In general, that first algorithm you showed was linear by Big-O standards. In a constrained case, where the maximum input is known, it is a mistake to speak of it in Big-O terms at all. In a constrained case, it could legitimately be replaced by some constant value when discussing the Big-O behavior of some other algorithm, but that says absolutely nothing about the Big-O behavior of the first algorithm.
In conclusion: O(Ackermann(n)) works fine when nMax is small enough. Very, very small enough...

In your case...
I am tempted to say that your friend is softly wrong. And that's because of the considerably big additional constant of 256 in O(1) run time. Your friend said that the execution was O(256). And because we ignore the constants in Big-O, we simply call O(256 * 1) as O(1). It is up to you to decide whether this constant is negligible for you or not.
I have two strong reasons to say that you are right:
Firstly, for various values of n, your answer of O(n) (in first code) gives a better approximation of the running-time. For example:
For a string of length 4: you say run-time is proportional to 4, while you friend says it is proportional to 1 (or 256).
For string of length 255: you say the running time is proportional to 255 while your friend again says that it is constant time.
Clearly, your answer is more accurate in every case, even though his answer is not outright wrong.
Secondly, if you go by your friend's method, then in one sense you can cheat and say that since no string can go beyond your RAM + disk size, therefore all processing is in O(1). And that's when the fallacy of your friend's reasoning becomes visible. Yes he is right that running time (assuming 1TB hard disk and 8 GB RAM) is O((1TB + 8GB) *1) = O(1), but you simply cannot ignore the size of your constant in this scenario.
The Big-O complexity does not tell the actual time of execution, but just the simplistic rate of growth of the running time as the value of n increases.

I think you're both right.
The runtime of the first algorithm is linear in the size of its input. However, if its input is fixed, then its runtime is also fixed.
Big O is all about measuring the behavior of an algorithm as its input changes. If the input never changes, then Big O is meaningless.
Also: O(n) means that the upper bound of complexity is N. If you want to represent a tight bound then the more precise notation is Θ(n) (theta notation).

You're both right in a way, but you're more right than your colleague. (EDIT: Nope. On further thought, you're right and your colleage is wrong. See my comment below.) The question really isn't whether N is known, but whether N can change. Is s the input to your algorithm? Then it's O(N) or O(N^2): you know the value of N for this particular input, but a different input would have a different value, so knowing N for this input isn't relevant.
Here's the difference in your two approaches. You're treating this code as if it looked like this:
def f(s):
for c in s:
print c
f("How are you today?")
But your colleague is treating it like this:
def f(some_other_input):
for c in "How are you today?":
print c
f("A different string")
In the latter case, that for loop should be considered O(1), because it's not going to change with different inputs. In the former case, the algorithm is O(N).

Related

Big O time complexity for limited input

I am new to Big O time complexity...
if I have a function that is calculating if a number is prime or not, but I am told that I can be guaranteed that the input parameter, p, will ALWAYS be less than 100, does that mean that the overall Big O time complexity is constant as O(1)?
Meaning, even if the worst case is that it will have to check every number up until p//2, at the worst case, p is 100 and this will mean that it will run 100 // 2 times and that is constant O(1)?
I hope that makes sense!
No, it is not O(1).
Short answer: if p changes, the function takes longer to run. O(1) would mean that regardless of the value of p, the runtime would not change.
The purpose of O(n) is to describe the behavior of the the function for varying input, so it will give you an understanding of how much slower it will run if you for example, double n.
In fact, there are superior algorithms to calculating if a number is prime, and the amount of time each takes to run will scale differently. As a result, it is important to correctly classify the complexity of what you have now in order to evaluate what is better.
This is a valid perspective, but unlikely to be the answer that your homework is looking for. You might get a bonus mark if you can frame it correctly but you should still answer the "real" question.
When I look up the definition of Big O, I see
... a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.
We normally think of this as "tends toward infinity", in which case you could say that this is O(1), i.e. the run time is constant as you tend toward infinity (above 100,000). But if we use "a particular value" (being 100,000) then no, this is not O(1).
To determine Big O would should consider the relationship between the input value N and the number of inner loop iterations, up to & before the maximum value of 100,000. If you double N, what's the impact on number of inner loop iterations?

Why Does Tree and Ensemble based Algorithm don't need feature scaling?

Recently, I've been interested in Data analysis.
So I researched about how to do machine-learning project and do it by myself.
I learned that scaling is important in handling features.
So I scaled every features while using Tree model like Decision Tree or LightGBM.
Then, the result when I scaled had worse result.
I searched on the Internet, but all I earned is that Tree and Ensemble algorithm are not sensitive to variance of the data.
I also bought a book "Hands-on Machine-learning" by O'Relly But I couldn't get enough explanation.
Can I get more detailed explanation for this?
Though I don't know the exact notations and equations, the answer has to do with the Big O Notation for the algorithms.
Big O notation is a way of expressing the theoretical worse time for an algorithm to complete over extremely large data sets. For example, a simple loop that goes over every item in a one dimensional array of size n has a O(n) run time - which is to say that it will always run at the proportional time per size of the array no matter what.
Say you have a 2 dimensional array of X,Y coords and you are going to loop across every potential combination of x/y locations, where x is size n and y is size m, your Big O would be O(mn)
and so on. Big O is used to compare the relative speed of different algorithms in abstraction, so that you can try to determine which one is better to use.
If you grab O(n) over the different potential sizes of n, you end up with a straight 45 degree line on your graph.
As you get into more complex algorithms you can end up with O(n^2) or O(log n) or even more complex. -- generally though most algorithms fall into either O(n), O(n^(some exponent)), O(log n) or O(sqrt(n)) - there are obviously others but generally most fall into this with some form of co-efficient in front or after that modifies where they are on the graph. If you graph each one of those curves you'll see which ones are better for extremely large data sets very quickly
It would entirely depend on how well your algorithm is coded, but it might look something like this: (don't trust me on this math, i tried to start doing it and then just googled it.)
Fitting a decision tree of depth ‘m’:
Naïve analysis: 2m-1 trees -> O(2m-1 n d log(n)).
each object appearing only once at a given depth: O(m n d log n)
and a Log n graph ... well pretty much doesn't change at all even with sufficiently large numbers of n, does it?
so it doesn't matter how big your data set is, these algorithms are very efficient in what they do, but also do not scale because of the nature of a log curve on a graph (the worst increase in performance for +1 n is at the very beginning, then it levels off with only extremely minor increases to time with more and more n)
Do not confuse trees and ensembles (which may be consist from models, that need to be scaled).
Trees do not need to scale features, because at each node, the entire set of observations is divided by the value of one of the features: relatively speaking, to the left everything is less than a certain value, and to the right - more. What difference then, what scale is chosen?

Time and Space analysis in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Can someone provide an example of O(log(n)) and O(nlog(n)) problems for both time and space?
I am quiet new to this type of analysis and can not see past polynomial time/space.
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that
like "semi-constant"?
Additionally, I would appreciate any great examples which cover these cases (both time and space):
I find space analysis a bit more ambiguous so it would be nice to see it compared to other cases from the time analysis in the same place - something I couldn't find reliably online.
Can you provide examples for each case in both space and time
analysis?
Before examples, a little clarification on big O notation
Perhaps I'm mistaken, but seeing
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that like "semi-constant"?
makes me think that you have been introduced to the idea of big-O notation as the number of operation to be carried (or number of bytes to be stored, etc), e.g. if you have a loop for(int i=0;i<n;++i) then there are n operations so the time complexity is O(n). While this is a nice first intuition, I think that it can be misleading as big-O notation defines a higher asymptotic bound.
Let's say that you have chosen an algorithm to sort an array of numbers, and let's denote x the number of element in that array, and f(x) the time complexity of that algorithm. Assume now that we say that the algorithm is O(g(x)). What this means is that as x grows, we will eventually reach a threshold x_t such that if x_i>x_t, then abs(f(x_i)) will always be lower or equal than alpha*g(x_i) where alpha is a postivie real number.
As a result, a function that is O(1) doesn't always take the same constant time, rather, you can be sure that no matter how many data it needs, the time it will take to complete its task will be lower than a constant amount of time, e.g. 5seconds. Similarly, O(log(n)) doesn't mean that there is any notion of a semi-constant. It just means that 1) the time the algorithm will take will depend on the size of the dataset that you feed it and 2) If the dataset is large enough (i.e. n is sufficiently large) then the time that it will take for it to complete is will always be less or equal than log(n).
Some examples regarding time complexity
O(1): Accessing an element from an array.
O(log(n)): binary search in an incrementally sorted array. Say you have an array of n elements and you want to find the index where the value is equal to x. You can start at the middle of the array, and if the value v that you read there is greater than x, you repeat the same process on the left side of v, and if it's smaller you look to the right side of v. You continue this process until the value you're looking for is found. As you can see, if you're lucky, you can find the value in the middle of the array on first try, or you can find it after log(n) operations. So there is no semi-constancy, and Big-O notation tells you the worst case.
O(nlogn): sorting an array using Heap sort. This is a bit too long to explain here.
O(n^2): computing the sum of all pixels on square gray-scale images (which you can consider as a 2d matrix of numbers).
O(n^3): naively multiplying two matrices of size n*n.
O(n^{2+epsilon}): multiplying matrices in smart ways (see wikipedia)
O(n!) naively computing a factorial.
Some examples regarding space complexity
O(1) Heapsort. One might think that since you need to remove variables from the root of the tree, you will need extra space. However, since a heap can just be implemented as an array, you can store the removed values at the end of said array instead of allocating new space.
An interesting example would be, I think, to compare two solutions to a classical problem: assume you have an array X of integers and a target value T, and that you are given the guarentee that there exist two values x,y in X such that x+y==T. You goal is to find those two values.
One solution (known as two-pointers) would be to sort the array using heapsort (O(1) space ) and then define two indexes i,j that respectively point to the start and end of the sorted array X_sorted. Then, if X[i]+X[j]<T, we increment i and if X[i]+X[j]>T, we decrement j. We stop when X[i]+X[j]==T. It's obvious that this requires no extra allocations, and so the solution has O(1) space complexity. A second solution would be this:
D={}
for i in range(len(X)):
D[T-X[i]]=i
for x in X:
y=T-x
if y in D:
return X[D[y]],x
which has space complexity O(n) because of the dictionary.
The examples given for time complexity above are (except the one regarding efficient matrix multiplications) pretty straight-forward to derive. As others have said I think that reading a book on the subject is your best bet at understanding this topic in depth. I highly recomment Cormen's book.
Here is a rather trivial answer: whatever formula f(n) you have, the following algorithms run in O(f(n)) time and space respectively, so long as f itself isn't too slow to compute.
def meaningless_waste_of_time(n):
m = f(n)
for i in range(int(m)):
print('foo')
def meaningless_waste_of_space(n):
m = f(n)
lst = []
for i in range(int(m)):
lst.append('bar')
For example, if you define f = lambda n: (n ** 2) * math.log(n) then the time and space complexities will be O(n² log n) respectively.
First of all I would like to point out the fact that we find out Time Complexity or Space Complexity of an Algorithm and not that of a programming language. If you consider calculating the time complexity of any program I can only suggest you go for C. Calculating Time Complexity in python is technically very difficult.
Example:
Say you are creating an list and the sorting it at every pass of a for loop, something like this
n = int(input())
for i in range(n):
l.append(int(input())
l = sorted(l)
Here, on the first glance our intuition will be that this has a time complexity of O(n), but on closer examination, one would notice that the sorted() function is being called and as we all know that any sorting algorithm can not be less than O(n log n) (except for radix and counting sort which have O(kn) and O(n+k) time complexity), so the minimum time complexity of this code will be O(n^2 log n).
With this I would suggest you to read some good Data Structure and Algorithm book for better understanding. You can go for a book which in prescribed in B. Tech or B.E. curriculum. Hope this helps you :)

Linear time v.s. Quadratic time

Often, some of the answers mention that a given solution is linear, or that another one is quadratic.
How to make the difference / identify what is what?
Can someone explain this, the easiest possible way, for the ones like me who still don't know?
A method is linear when the time it takes increases linearly with the number of elements involved. For example, a for loop which prints the elements of an array is roughly linear:
for x in range(10):
print x
because if we print range(100) instead of range(10), the time it will take to run it is 10 times longer. You will see very often that written as O(N), meaning that the time or computational effort to run the algorithm is proportional to N.
Now, let's say we want to print the elements of two for loops:
for x in range(10):
for y in range(10):
print x, y
For every x, I go 10 times looping y. For this reason, the whole thing goes through 10x10=100 prints (you can see them just by running the code). If instead of using 10, I use 100, now the method will do 100x100=10000. In other words, the method goes as O(N*N) or O(N²), because every time you increase the number of elements, the computation effort or time will increase as the square of the number of points.
They must be referring to run-time complexity also known as Big O notation. This is an extremely large topic to tackle. I would start with the article on wikipedia: https://en.wikipedia.org/wiki/Big_O_notation
When I was researching this topic one of the things I learned to do is graph the runtime of my algorithm with different size sets of data. When you graph the results you will notice that the line or curve can be classified into one of several orders of growth.
Understanding how to classify the runtime complexity of an algorithm will give you a framework to understanding how your algorithm will scale in terms of time or memory. It will give you the power to compare and classify algorithms loosely with each other.
I'm no expert but this helped me get started down the rabbit hole.
Here are some typical orders of growth:
O(1) - constant time
O(log n) - logarithmic
O(n) - linear time
O(n^2) - quadratic
O(2^n) - exponential
O(n!) - factorial
If the wikipedia article is difficult to swallow, I highly recommend watching some lectures on the subject on iTunes University and looking into the topics of algorithm analysis, big-O notation, data structures and even operation counting.
Good luck!
You usually argue about an algorithm in terms of their input size n (if the input is an array or a list). A linear solution to a problem would be an algorithm which execution times scales lineary with n, so x*n + y, where x and y are real numbers. n appears with a highest exponent of 1: n = n^1.
With a quadratic solution, n appears in a term with 2 as the highest exponent, e.g. x*n^2 + y*n + z.
For arbitrary n, the linear solution grows in execution time much slower than the quadratic one.
For mor information, look up Big O Notation.
You do not specify but as you mention a solution it is possible you are asking about quadratic and linear convergence. To this end, if you have an algorithm that is iterative and generates a sequence of approximations to a convergent value, then you have quadratic convergence when you can show that
x(n) <= c * x(n-1)^2
for some positive constant c. That is to say that the error in the solution at iteration n+1 is less than the square of the error at iteration n. See this for a fuller introduction for more general convergence rate definitions http://en.wikipedia.org/wiki/Rate_of_convergence

Comparing Root-finding (of a function) algorithms in Python

I would like to compare different methods of finding roots of functions in python (like Newton's methods or other simple calc based methods). I don't think I will have too much trouble writing the algorithms
What would be a good way to make the actual comparison? I read up a little bit about Big-O. Would this be the way to go?
The answer from #sarnold is right -- it doesn't make sense to do a Big-Oh analysis.
The principal differences between root finding algorithms are:
rate of convergence (number of iterations)
computational effort per iteration
what is required as input (i.e. do you need to know the first derivative, do you need to set lo/hi limits for bisection, etc.)
what functions it works well on (i.e. works fine on polynomials but fails on functions with poles)
what assumptions does it make about the function (i.e. a continuous first derivative or being analytic, etc)
how simple the method is to implement
I think you will find that each of the methods has some good qualities, some bad qualities, and a set of situations where it is the most appropriate choice.
Big O notation is ideal for expressing the asymptotic behavior of algorithms as the inputs to the algorithms "increase". This is probably not a great measure for root finding algorithms.
Instead, I would think the number of iterations required to bring the actual error below some epsilon ε would be a better measure. Another measure would be the number of iterations required to bring the difference between successive iterations below some epsilon ε. (The difference between successive iterations is probably a better choice if you don't have exact root values at hand for your inputs. You would use a criteria such as successive differences to know when to terminate your root finders in practice, so you could or should use them here, too.)
While you can characterize the number of iterations required for different algorithms by the ratios between them (one algorithm may take roughly ten times more iterations to reach the same precision as another), there often isn't "growth" in the iterations as inputs change.
Of course, if your algorithms take more iterations with "larger" inputs, then Big O notation makes sense.
Big-O notation is designed to describe how an alogorithm behaves in the limit, as n goes to infinity. This is a much easier thing to work with in a theoretical study than in a practical experiment. I would pick things to study that you can easily measure that and that people care about, such as accuracy and computer resources (time/memory) consumed.
When you write and run a computer program to compare two algorithms, you are performing a scientific experiment, just like somebody who measures the speed of light, or somebody who compares the death rates of smokers and non-smokers, and many of the same factors apply.
Try and choose an example problem or problems to solve that is representative, or at least interesting to you, because your results may not generalise to sitations you have not actually tested. You may be able to increase the range of situations to which your results reply if you sample at random from a large set of possible problems and find that all your random samples behave in much the same way, or at least follow much the same trend. You can have unexpected results even when the theoretical studies show that there should be a nice n log n trend, because theoretical studies rarely account for suddenly running out of cache, or out of memory, or usually even for things like integer overflow.
Be alert for sources of error, and try to minimise them, or have them apply to the same extent to all the things you are comparing. Of course you want to use exactly the same input data for all of the algorithms you are testing. Make multiple runs of each algorithm, and check to see how variable things are - perhaps a few runs are slower because the computer was doing something else at a time. Be aware that caching may make later runs of an algorithm faster, especially if you run them immediately after each other. Which time you want depends on what you decide you are measuring. If you have a lot of I/O to do remember that modern operating systems and computer cache huge amounts of disk I/O in memory. I once ended up powering the computer off and on again after every run, as the only way I could find to be sure that the device I/O cache was flushed.
You can get wildly different answers for the same problem just by changing starting points. Pick an initial guess that's close to the root and Newton's method will give you a result that converges quadratically. Choose another in a different part of the problem space and the root finder will diverge wildly.
What does this say about the algorithm? Good or bad?
I would suggest you to have a look at the following Python root finding demo.
It is a simple code, with some different methods and comparisons between them (in terms of the rate of convergence).
http://www.math-cs.gordon.edu/courses/mat342/python/findroot.py
I just finish a project where comparing bisection, Newton, and secant root finding methods. Since this is a practical case, I don't think you need to use Big-O notation. Big-O notation is more suitable for asymptotic view. What you can do is compare them in term of:
Speed - for example here newton is the fastest if good condition are gathered
Number of iterations - for example here bisection take the most iteration
Accuracy - How often it converge to the right root if there is more than one root, or maybe it doesn't even converge at all.
Input - What information does it need to get started. for example newton need an X0 near the root in order to converge, it also need the first derivative which is not always easy to find.
Other - rounding errors
For the sake of visualization you can store the value of each iteration in arrays and plot them. Use a function you already know the roots.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use to compare them (your "evaluation protocol", so to say), then you might be interested in ways to run your challengers on actual datasets.
This tutorial explains how to do it, based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Categories