Finding time complexity of a specific python algorithm [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Hi I need to understand what is the time complexity of the following algorithm.
def complex(n):
l=[]
i=1
while i<n:
l=list(range(i))
i*=2
I've got to the point where I realized it runs int(log(n,2)) times over the loop but I'm having hard time incorporating the range(i) into the final expression.
Any help appreciated thank you.

You've already worked out that it runs int(log(n, 2)) iterations. (You can test that very easily by just adding a counter into the loop, and calling it with, e.g., 1, 2, 4, 8, 16, 32, 64, etc., and seeing that the counter goes up 1 every time n doubles.)
Now you want to know how long the inside of the loop takes. Here, you'd need to know the time complexity of the range and list functions. I can give you the answers to those, and in fact you might be able to guess them, but you can't really prove that unless you start reading the source code to CPython. So, let's test it with some simple timing:
import timeit
for i in range(20):
n = 1 << i
t = timeit.timeit(lambda: list(range(n))
print('{} takes {}'.format(n, t))
If you run this, you'll see that, once you get beyond around 32, doubling n seems to double the time it takes. So, that means list(range(n)) is O(n), right?
Let's verify whether that makes sense. I don't know whether you're using Python 2.x or 3.x, so I'll work it out both ways.
In 2.x: range(n) has to calculate n integers, and build a list n values long. That seems like it ought to be O(n).
In 3.x: range(n) just returns an object that remembers the number n. That ought to be O(1). But then we call list on that range, which has to iterate the whole range, calculating all n integers, and building a list n values long. So it's still O(n).
Put that back into your loop, and you have O(log n) times through the loop, each one O(i) complexity. So, the total time is O(1) + O(2) + O(4) + O(…) + O(n/4) + O(n/2) + O(n), with log(n) steps in the summation. In other words, it's the sum of a geometric sequence. And now you can solve the problem. (Or, if not, you're stuck on a new part, which someone can answer for your very simply if you can't figure it out yourself.)
You worked out that the sum is -(1-2**log(n,2)). That's not quite right, because you wanted a closed range, not a half-open range, so it should be -(1-2**log(n+1,2)). But that's probably my fault for not explaining it clearly, and it doesn't matter too much, so let's go with your version first.
2**log(n, 2) is obviously n. (If you don't understand exponentiation and logarithms well enough to understand why, you should find a tutorial on the math, but meanwhile you can test it with a variety of different values of n to convince yourself.)
Meanwhile, -(1-x) for any x is just x-1.
So, the sum is just n-1.
If you go back and use the correct log(n+1, 2) instead of log(n, 2), you'll get 2n-1.
So, is that correct? Let's test with some actual numbers.
If n = 16, you get 1+2+4+8+16 = 31 = 2n-1. If n = 1024, you get 1+2+4+…+256+512+1024 = 2047 = 2n-1. Any power-of-2 you throw at it, you get exactly the right answer. For a non-power-of-2, like 1000, you get 1+2+4+…+256+512+1000 = 2023, which is not exactly 2n-1, but it's always within a factor of 2. (In fact, it's n + 2**(ceil(log(n, 2)) - 1, or n + m - 1 where m is the n rounded up to a power of 2.)
Anyway, n-1, 2n-1, n + 2**(ceil(log(n, 2)) - 1… those are all O(n).
And you can go back and test this by timing the whole function with different values of n and see that, beyond very small numbers, when you double n it takes about twice as long.

This one is trickier than it looks, because list(range(i)) is based on i, which grows geometrically. There is a simple way to look at the problem, however. How many elements are created, in total, relative to n? You can simplify the problem by assuming convenient values of n, i.e., powers of two.
If you're still stuck, start small and go up. If n = 1, how many elements are created in total? Then see what answer you get for n = 2, n = 4, n = 8, and so on. The pattern should become obvious fairly quickly.

Related

How to find the numbers that their sum equal to a given number?

This seems like a repost but it's not. Most people asked for the pair numbers. I need all numbers, not just two. Like n0+n1+n2+n3+n4+...+n100=x And also, each number must be lower than the number comes before him. For example for 8 output must be:
There are 5:
7+1
5+3
6+2
5+2+1
4+3+1
In here you can see 5>3 and 6>2 and vice versa.
I came so close with this code. I found this on stackoverflow, and then improved it to according to my needs. However I tested it on a super computer, that if you give 200 to this, it says it's wrong. I just don't get it how could this be wrong? Please can someone help me improve this? With my pc it takes a lot of time. Give this 50, and it will still take hours. Here is my code:
from itertools import combinations
def solution(N):
array=[]
counter=0
for i in range(N):
if(i==0):
continue
array.append(i)
for K in range(N):
for comb in combinations(array, K):
if sum(comb) == N:
counter+=1
#print(comb) uncomment this if you like, it prints pairs.
return counter
res=solution(50)
print(res)
By the way, can I do it without itertools at all? Maybe it causes problems.
I think I may have found a solution, I will share it and maybe you can check with me if it is correct :) I assumed that all numbers must be smaller than N.
To start, I am not so sure your code is necessarily wrong. I think it produces the right answers but maybe you should consider what is happening and why your code takes so long. Currently, you are checking for all combination of sums, but by iterating in another way you can exclude many possibilities. For example, suppose my goal is to find all sums that result in a total of 8.
Suppose I now have a sum of 6 + 5 = 11. Currently, you are still checking all other possibilities when adding other numbers (i.e. 6 + 5 + 4 and 6 + 5 + 3 etc etc), but we already know they all will be >8, hence we do not even have to compute them.
As a solution we can start with the highest number smaller than our goal, i.e. 7 in our example. Then we will try all combinations with numbers smaller than this. As soon as our sum gets bigger than 8, we do not follow the trail further. This sounds a lot like recursing, which is how I currently implemented it.
To get an idea (I hope it is correct, I haven't tested it extensively):
def solution(goal, total_solutions=0, current_sum=0.0, current_value=None):
if current_value is None:
current_value = goal
# Base condition
if current_sum >= goal:
if current_sum == goal:
return total_solutions + 1
return total_solutions
for new_value in range(current_value - 1, 0, -1):
total_solutions = solution(
goal, total_solutions, current_sum + new_value, new_value
)
return total_solutions
res = solution(8)
print(res) # prints 5
So as an answer to your question, yes you can do it with itertools, but it will take a long time because you will be checking a lot of sums of which you do not really need to check.
I compared this program with yours and it produces the same output up until N=30. But then your code really starts to blow up so I won't check further.
The Answer you're looking for is in the post : programming challenge: how does this algorithm (tied to Number Theory) work?
The classical method start taking some time at the 100th step but the use of memoization also called dynamic programming reduces the complexity drastically and allows your algorithm to compute at the 4000th step without taking any time at all.

How can I get my function to add together its output?

So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.

Code finding the first triangular number with more than 500 divisors will not finish running

Okay, so I'm working on Euler Problem 12 (find the first triangular number with a number of factors over 500) and my code (in Python 3) is as follows:
factors = 0
y=1
def factornum(n):
x = 1
f = []
while x <= n:
if n%x == 0:
f.append(x)
x+=1
return len(f)
def triangle(n):
t = sum(list(range(1,n)))
return t
while factors<=500:
factors = factornum(triangle(y))
y+=1
print(y-1)
Basically, a function goes through all the numbers below the input number n, checks if they divide into n evenly, and if so add them to a list, then return the length in that list. Another generates a triangular number by summing all the numbers in a list from 1 to the input number and returning the sum. Then a while loop continues to generate a triangular number using an iterating variable y as the input for the triangle function, and then runs the factornum function on that and puts the result in the factors variable. The loop continues to run and the y variable continues to increment until the number of factors is over 500. The result is then printed.
However, when I run it, nothing happens - no errors, no output, it just keeps running and running. Now, I know my code isn't the most efficient, but I left it running for quite a bit and it still didn't produce a result, so it seems more likely to me that there's an error somewhere. I've been over it and over it and cannot seem to find an error.
I'd merely request that a full solution or a drastically improved one isn't given outright but pointers towards my error(s) or spots for improvement, as the reason I'm doing the Euler problems is to improve my coding. Thanks!
You have very inefficient algorithm.
If you ask for pointers rather than full solution, main pointers are:
There is a more efficient way to calculate next triangular number. There is an explicit formula in the wiki. Also if you generate sequence of all numbers it is just more efficient to add next n to the previous number. (Sidenote list in sum(list(range(1,n))) makes no sense to me at all. If you want to use this approach anyway, sum(xrange(1,n) will probably be much more efficient as it doesn't require materialization of the range)
There are much more efficient ways to factorize numbers
There is a more efficient way to calculate number of factors. And it is actually called after Euler: see Euler's totient function
Generally Euler project problems (as in many other programming competitions) are not supposed to be solvable by sheer brute force. You should come up with some formula and/or more efficient algorithm first.
As far as I can tell your code will work, but it will take a very long time to calculate the number of factors. For 150 factors, it takes on the order of 20 seconds to run, and that time will grow dramatically as you look for higher and higher number of factors.
One way to reduce the processing time is to reduce the number of calculations that you're performing. If you analyze your code, you're calculating n%1 every single time, which is an unnecessary calculation because you know every single integer will be divisible by itself and one. Are there any other ways you can reduce the number of calculations? Perhaps by remembering that if a number is divisible by 20, it is also divisible by 2, 4, 5, and 10?
I can be more specific, but you wanted a pointer in the right direction.
From the looks of it the code works fine, it`s just not the best approach. A simple way of optimizing is doing until the half the number, for example. Also, try thinking about how you could do this using prime factors, it might be another solution. Best of luck!
First you have to def a factor function:
from functools import reduce
def factors(n):
step = 2 if n % 2 else 1
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(pow(n,0.5) + 1)) if n % i
== 0)))
This will create a set and put all of factors of number n into it.
Second, use while loop until you get 500 factors:
a = 1
x = 1
while len(factors(a)) < 501:
x += 1
a += x
This loop will stop at len(factors(a)) = 500.
Simple print(a) and you will get your answer.

Sum of primes below 2,000,000 in python

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.
However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.
Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.
Can someone please help me optimize my code, and clearly explain how it works along the way?
Here is my code:
primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
pos += 1
number = numbers[pos] # moves to next prime in list numbers
sum_of_primes += number # adds prime to total sum
num = number
while num < primes_below_number:
num += number
if num in numbers[:]:
numbers.remove(num) # removes multiples of prime found
print sum_of_primes + 2
As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.
As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.
Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.
Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.
We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.
# number to find summation of all primes below number
hi = 2000000
# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2))
for number in xrange(3, int(hi ** 0.5) + 1):
if number not in numbers:
#number must have been removed because it has a prime factor
continue
num = number
while num < hi:
num += number
if num in numbers:
# Remove multiples of prime found
numbers.remove(num)
print 2 + sum(numbers)
You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.
You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.
There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)
First, removing numbers from the list will be very slow. Instead of this, make a list
primes = primes_below_number * True
primes[0] = False
primes[1] = False
Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)
At the end,
primes = [n for n i range(primes_below_number) if primes[n]]
This should be a great deal faster.
Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.
Try using numpy, should make it faster. Replace range by xrange, it may help you.
Here's an optimization for your code:
import itertools
primes_below_number = 2000000
numbers = list(range(3, primes_below_number, 2))
pos = 0
while pos < len(numbers) - 1:
number = numbers[pos]
numbers = list(
itertools.chain(
itertools.islice(numbers, 0, pos + 1),
itertools.ifilter(
lambda n: n % number != 0,
itertools.islice(numbers, pos + 1, len(numbers))
)
)
)
pos += 1
sum_of_primes = sum(numbers) + 2
print sum_of_primes
The optimization here is because:
Removed the sum to outside the loop.
Instead of removing elements from a list we can just create another one, memory is not an issue here (I hope).
When creating the new list we create it by chaining two parts, the first part is everything before the current number (we already checked those), and the second part is everything after the current number but only if they are not divisible by the current number.
Using itertools can make things faster since we'd be using iterators instead of looping through the whole list more than once.
Another solution would be to not remove parts of the list but disable them like #saulspatz said.
And here's the fastest way I was able to find: http://www.wolframalpha.com/input/?i=sum+of+all+primes+below+2+million 😁
Update
Here is the boolean method:
import itertools
primes_below_number = 2000000
numbers = [v % 2 != 0 for v in xrange(primes_below_number)]
numbers[0] = False
numbers[1] = False
numbers[2] = True
number = 3
while number < primes_below_number:
n = number * 3 # We already excluded even numbers
while n < primes_below_number:
numbers[n] = False
n += number
number += 1
while number < primes_below_number and not numbers[number]:
number += 1
sum_of_numbers = sum(itertools.imap(lambda index_n: index_n[1] and index_n[0] or 0, enumerate(numbers)))
print(sum_of_numbers)
This executes in seconds (took 3 seconds on my 2.4GHz machine).
Instead of storing a list of numbers, you can instead store an array of boolean values. This use of a bitmap can be thought of as a way to implement a set, which works well for dense sets (there aren't big gaps between the values of members).
An answer on a recent python sieve question uses this implementation python-style. It turns out a lot of people have implemented a sieve, or something they thought was a sieve, and then come on SO to ask why it was slow. :P Look at the related-questions sidebar from some of them if you want more reading material.
Finding the element that holds the boolean that says whether a number is in the set or not is easy and extremely fast. array[i] is a boolean value that's true if i is in the set, false if not. The memory address can be computed directly from i with a single addition.
(I'm glossing over the fact that an array of boolean might be stored with a whole byte for each element, rather than the more efficient implementation of using every single bit for a different element. Any decent sieve will use a bitmap.)
Removing a number from the set is as simple as setting array[i] = false, regardless of the previous value. No searching, not comparison, no tracking of what happened, just one memory operation. (Well, two for a bitmap: load the old byte, clear the correct bit, store it. Memory is byte-addressable, but not bit-addressable.)
An easy optimization of the bitmap-based sieve is to not even store the even-numbered bytes, because there is only one even prime, and we can special-case it to double our memory density. Then the membership-status of i is held in array[i/2]. (Dividing by powers of two is easy for computers. Other values are much slower.)
An SO question:
Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm? has many links to good stuff about the sieve. This one in particular has some good discussion about it, in words rather than just code. (Nevermind the fact that it's talking about a common Haskell implementation that looks like a sieve, but actually isn't. They call this the "unfaithful" sieve in their graphs, and so on.)
discussion on that question brought up the point that trial division may be fast than big sieves, for some uses, because clearing the bits for all multiples of every prime touches a lot of memory in a cache-unfriendly pattern. CPUs are much faster than memory these days.

Python - Sum of numbers

I am trying to sum all the numbers up to a range, with all the numbers up to the same range.
I am using python:
limit = 10
sums = []
for x in range(1,limit+1):
for y in range(1,limit+1):
sums.append(x+y)
This works just fine, however, because of the nested loops, if the limit is too big it will take a lot of time to compute the sums.
Is there any way of doing this without a nested loop?
(This is just a simplification of something that I need to do to solve a ProjectEuler problem. It involves obtaining the sum of all abundant numbers.)
[x + y for x in xrange(limit + 1) for y in xrange(x + 1)]
This still performs just as many calculations but will do it about twice as fast as a for loop.
from itertools import combinations
(a + b for a, b in combinations(xrange(n + 1, 2)))
This avoids a lot of duplicate sums. I don't know if you want to keep track of those or not.
If you just want every sum with no representation of how you got it then xrange(2*n + 2)
gives you what you want with no duplicates or looping at all.
In response to question:
[x + y for x in set set1 for y in set2]
I am trying to sum all the numbers up
to a range, with all the numbers up to
the same range.
So you want to compute limit**2 sums.
because of the nested loops, if the
limit is too big it will take a lot of
time to compute the sums.
Wrong: it's not "because of the nested loops" -- it's because you're computing a quadratic number of sums, and therefore doing a quadratic amount of work.
Is there any way of doing this without
a nested loop?
You can mask the nesting, as in #aaron's answer, and you can halve the number of sums you compute due to the problem's simmetry (though that doesn't do the same thing as your code), but, to prepare a list with a quadratic number of items, there's absolutely no way to avoid doing a quadratic amount of work.
However, for your stated purpose
obtaining the sum of all abundant
numbers.
you're need an infinite amount of work, since there's an infinity of abundant numbers;-).
I think you have in mind problem 23, which is actually very different: it asks for the sum of all numbers that cannot be expressed as the sum of two abundant numbers. How the summation you're asking about would help you move closer to that solution really escapes me.
I'm not sure if there is a good way not using nested loops.
If I put on your shoes, I'll write as following:
[x+y for x in range(1,limit+1) for y in range(1,limit+1)]

Categories