I have a very long loop, and I would like to check the status every N iterations, in my specific case I have a loop of 10 million elements and I want to print a short report every millionth iteration.
So, currently I am doing just (n is the iteration counter):
if (n % 1000000==0):
print('Progress report...')
but I am worried I am slowing down the process by computing the modulus at each iteration, as one iteration lasts just few milliseconds.
Is there a better way to do this? Or shouldn't I worry at all about the modulus operation?
How about keeping a counter and resetting it to zero when you reach the wanted number? Adding and checking equality is faster than modulo.
printcounter = 0
# Whatever a while loop is in Python
while (...):
...
if (printcounter == 1000000):
print('Progress report...')
printcounter = 0
...
printcounter += 1
Although it's quite possible that the compiler is doing some sort of optimization like this for you already... but this may give you some peace of mind.
1. Human-language declarations for x and n:
let x be the number of iterations that have been examined at any given time.
let n be the multiple of iterations upon which your code will executed.
Example 1: "After x iterations, how many times was n done?"
Example 2: "It is the xth iteration and the action has occurred every nth time, so far."
2. What we're doing:
The first code block (Block A) uses only one variable, x (defined above), and uses 5 (an integer) rather than the variable n (defined above).
The second code block (Block B) uses both of the variables (x and n) that are defined above. The integer, 5, will be replaced by the variable, n. So, Block B literally performs an action at each nth iteration.
Our goal is to do one thing on every iteration and two things on every nth iteration.
We are going through 100 iterations.
m. Easy-to-understand attempt:
Block A, minimal variables:
for x in 100:
#what to do every time (100 times in-total): replace this line with your every-iteration functions.
if x % 5 == 0:
#what to do every 5th time: replace this line with your nth-iteration functions.
Block B, generalization.
n = 5
for x in 100:
#what to do every time (100 times in-total): replace this line with your every-iteration functions.
if x % n == 0:
#what to do every 5th time: replace this line with your nth-iteration functions.
Please, let me know if you have any issues because I haven't had time to test it after writing it here.
3. Exercises
If you've done this properly, see if you can use it with the turtle.Pen() and turtle.forward() function. For example, move the turtle forward 4 times and then right and forward once?
See if you can use this program with the turtle.circle() function. For example, draw a circle with radius+1 4 times and a circle of a new color with radiut+1 once?
Check out the reading (seen below) to attempt to improve the programs from exercise 1 and 2. I can't think of a good reason to be doing this: I just feel like it might be useful!
About modulo and other basic operators:
https://docs.python.org/2/library/stdtypes.html
http://www.tutorialspoint.com/python/python_basic_operators.htm
About turtle:
https://docs.python.org/2/library/turtle.html
https://michael0x2a.com/blog/turtle-examples
Is it really slowing down? You have to try and see for yourself. It won't be much of a slowdown, but if we're talking about nanoseconds it may be considerable. Alternatively you can convert one 10 million loop to two smaller loops:
m = 1000000
for i in range(10):
for i in range(m):
// do sth
print("Progress report")
It's difficult to know how your system will optimize your code without testing.
You could simplify the relational part by realizing that zero is evaluated as false.
if(not N % 10000000)
do stuff
Something like that ? :
for n in xrange(1000000,11000000,1000000):
for i in xrange(n-1000000,n):
x = 10/2
print 'Progress at '+str(i)
result
Progress at 999999
Progress at 1999999
Progress at 2999999
Progress at 3999999
Progress at 4999999
Progress at 5999999
Progress at 6999999
Progress at 7999999
Progress at 8999999
Progress at 9999999
.
EDIT
Better:
for n in xrange(0,10000000,1000000):
for i in xrange(n,n+1000000):
x = 10/2
print 'Progress at '+str(i)
And inspired from pajton:
m = 1000000
for n in xrange(0,10*m,m):
for i in xrange(n,n+m):
x = 10/2
print 'Progress at '+str(i+1)
I prefer this that I find more immediately readable than the pajton's solution.
It keeps the display of a value depending of i
I'd do some testing to see how much time your modulus calls are consuming. You can use timeit for that. If your results indicate a need for time reduction, another approach which eliminates your modulus calculation:
for m in xrange(m_min, m_max):
for n in xrange(n_min, n_max):
#do_n_stuff
print('Progress report...')
It's fast enough that I wouldn't worry about it.
If you really wanted to speed it up, you could do this to avoid the modulus
if (n == 1000000):
n = 0
print('Progress report...')
This makes the inner loop lean, and m does not have to be divisible by interval.
m = 10000000
interval = 1000000
i = 0
while i < m:
checkpoint = min(m, i+interval)
for j in xrange(i, checkpoint):
#do something
i = checkpoint
print "progress"
When I'm doing timing/reports based on count iterations, I just divide my counter by the desired iteration and determine if the result is an integer. So:
if n/1000000 == int(n/1000000):
print(report)
Related
I just got this in an interview question, but got stumped on mod, my answer was O(log10) because it was divided every-time, but I couldn't explain it. So i switched my answer to O(n) but the interviewer said what does n represent, and I was going to say x, but that didn't make sense. if x is 1000, the code doesn't run 1000 times.
x is any number 1-10000000
def code(x):
count = 0
while x > 0:
x = x // 10
result = x % 7
if (result % 7 )== 0:
count += 1
return count
The while loop executes about log10x times. The rest is constant-time operations. Therefore, your time complexity is O(log x). Notice that in big-O notation, the base of the logarithm does not matter.
A more complicated question is what value will count have with respect to x. But this is not what you asked for. Notice that these calculations (assuming constant-time arithmetic operations) take constant time, just like the line x = x // 10.
Edit: If x is bounded above, it turns into a philosophical question whether the time complexity is O(log n) or O(1). It depends on how you view it (is the bound a part of the algorithm, or do you view the algorithm as general and the bound just something you know about the input you will get to it).
I'm trying to estimate the value of pi up to 3 d.p but it feels extremely slow, for example it takes almost 20 minutes to loop 10,000 times. I'm assuming because it keeps individually checking every single value so I was wondering if there's anyway to loop faster since I need to find a better average.
def est(b,a,d,c):
total=0
count=0
dp=False
while not dp:
count+=1
x,y=random.uniform(a,b),random.uniform(c,d)
if x**2+y**2<1:
total+=1
areaest=((abs(b-a)*abs(d-c))/count)*total
round=float("{:.3f}".format(areaest))
if round==3.142:
dp=True
return count
There isn't an O() improvement to be had. Monte Carlo methods rely on trying things many, many times.
There are lots of low-level ways to cut cycles, though, which you pick up by experience. For example, move invariant computations out of loops, multiply once instead of raising to the power 2, use float literals where appropriate instead of integer literals that you know will have to be converted to float every time ... Here's a version with a bunch of those:
def est(b, a, d, c):
from random import uniform
box_area = float(abs(b-a) * abs(d-c))
total = count = 0
while True:
count += 1
x, y = uniform(a, b), uniform(c, d)
if x*x + y*y < 1.0:
total += 1
areaest = box_area / count * total
if abs(areaest - 3.142) < 0.0005:
break
return count
BTW, a meta-comment: why are you checking for convergence to 3.142 at all? In a Monte Carlo application, you typically don't know the result you're looking for in advance. If you did, why bother running Monte Carlo?
More typical: you pick a fixed number of iteration to run in advance. Then average over many runs each using that fixed number of iterations. Timing is at least roughly predictable then.
I'm trying to find out the time complexity (Big-O) of functions and trying to provide appropriate reason.
First function goes:
r = 0
# Assignment is constant time. Executed once. O(1)
for i in range(n):
for j in range(i+1,n):
for k in range(i,j):
r += 1
# Assignment and access are O(1). Executed n^3
like this.
I see that this is triple nested loop, so it must be O(n^3).
but I think my reasoning here is very weak. I don't really get what is going
on inside the triple nested loop here
Second function is:
i = n
# Assignment is constant time. Executed once. O(1)
while i>0:
k = 2 + 2
i = i // 2
# i is reduced by the equation above per iteration.
# so the assignment and access which are O(1) is executed
# log n times ??
I found out this algorithm to be O(1). But like the first function,
I don't see what is going on in the while-loop.
Can someone explain thoroughly about the time complexity of the two
functions? Thanks!
For such a simple case, you could find the number of iterations of the innermost loop as a function of n exactly:
sum_(i=0)^(n-1)(sum_(j=i+1)^(n-1)(sum_(k=i)^(j-1) 1)) = 1/6 n (n^2-1)
i.e., Θ(n**3) time complexity (see Big Theta): it assumes that r += 1 is O(1) if r has O(log n) digits (model has words with log n bits).
The second loop is even simpler: i //= 2 is i >>= 1. n has Θ(log n) digits and each iteration drops one binary digit (shift right) and therefore the whole loop is Θ(log n) time complexity if we assume that the i >> 1 shift of log(n) digits is O(1) operation (same model as in the first example).
Well first of all, for the first function, the time complexity seems to be closer to O(N log N) because the parameters of each loop decreases each time.
Also, for the second function, the runtime is O(log2 N). Except, say i == n == 2. After one run i is 1. After another i is 0.5. After another i is 0.25. And so on... I assume you would want int(i).
For a rigorous mathematical approach to each function, you can go to https://www.coursera.org/course/algo. It's a great course for this sort of thing. I was sort of sloppy in my calculations.
I have this python code to generate prime numbers. I added a little piece of code (between # Start progress code and # End progress code) to display the progress of the operation but it slowed down the operation.
#!/usr/bin/python
a = input("Enter a number: ")
f = open('data.log', 'w')
for x in range (2, a):
p = 1
# Start progress code
s = (float(x)/float(a))*100
print '\rProcessing ' + str(s) + '%',
# End progress code
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
if p != 0:
f.write(str(x) + ", ")
print '\rData written to \'data.log\'. Press Enter to exit...'
raw_input()
My question is how to show the progress of the operation without slowing down the actual code/loop. Thanks in advance ;-)
To answer your question I/O is very expensive, and so printing out your progress will have a huge impact on performance. I would avoid printing if possible.
If you are concerned about speed, there is a very nice optimization you can use to greatly speed up your code.
For you inner for loop, instead of
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
use
for i in range(2, x-1**(1.0/2)):
c = x % i
if c == 0:
p = 0
break
You only need to iterate from the range of 2 to the square root of the number you are primality testing.
You can use this optimization to offset any performance loss from printing your progress.
Your inner loop is O(n) time. If you're experiencing lag toward huge numbers then it's pretty normal. Also you're converting x and a into float while performing division; as they get bigger, it could slow down your process.
First, I hope this is a toy problem, because (on quick glance) it looks like the whole operation is O(n^2).
You probably want to put this at the top:
from __future__ import division # Make floating point division the default and enable the "//" integer division operator.
Typically for huge loops where each iteration is inexpensive, progress isn't output every iteration because it would take too long (as you say you are experiencing). Try outputting progress either a fixed number of times or with a fixed duration between outputs:
N_OUTPUTS = 100
OUTPUT_EVERY = (a-2) // 5
...
# Start progress code
if a % OUTPUT_EVERY == 0:
print '\rProcessing {}%'.format(x/a),
# End progress code
Or if you want to go by time instead:
UPDATE_DT = 0.5
import time
t = time.time()
...
# Start progress code
if time.time() - t > UPDATE_DT:
print '\rProcessing {}%'.format(x/a),
t = time.time()
# End progress code
That's going to be a little more expensive, but will guarantee that even as the inner loop slows down, you won't be left in the dark for more than one iteration or 0.5 seconds, whichever takes longer.
def f2(L):
sum = 0
i = 1
while i < len(L):
sum = sum + L[i]
i = i * 2
return sum
Let n be the size of the list L passed to this function. Which of the following most accurately describes how the runtime of this function grow as n grows?
(a) It grows linearly, like n does.
(b) It grows quadratically, like n^2 does.
(c) It grows less than linearly.
(d) It grows more than quadratically.
I don't understand how you figure out the relationship between the runtime of the function and the growth of n. Can someone please explain this to me?
ok, since this is homework:
this is the code:
def f2(L):
sum = 0
i = 1
while i < len(L):
sum = sum + L[i]
i = i * 2
return sum
it is obviously dependant on len(L).
So lets see for each line, what it costs:
sum = 0
i = 1
# [...]
return sum
those are obviously constant time, independant of L.
In the loop we have:
sum = sum + L[i] # time to lookup L[i] (`timelookup(L)`) plus time to add to the sum (obviously constant time)
i = i * 2 # obviously constant time
and how many times is the loop executed?
it's obvously dependant on the size of L.
Lets call that loops(L)
so we got an overall complexity of
loops(L) * (timelookup(L) + const)
Being the nice guy I am, I'll tell you that list lookup is constant in python, so it boils down to
O(loops(L)) (constant factors ignored, as big-O convention implies)
And how often do you loop, based on the len() of L?
(a) as often as there are items in the list (b) quadratically as often as there are items in the list?
(c) less often as there are items in the list (d) more often than (b) ?
I am not a computer science major and I don't claim to have a strong grasp of this kind of theory, but I thought it might be relevant for someone from my perspective to try and contribute an answer.
Your function will always take time to execute, and if it is operating on a list argument of varying length, then the time it takes to run that function will be relative to how many elements are in that list.
Lets assume it takes 1 unit of time to process a list of length == 1. What the question is asking, is the relationship between the size of the list getting bigger vs the increase in time for this function to execute.
This link breaks down some basics of Big O notation: http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
If it were O(1) complexity (which is not actually one of your A-D options) then it would mean the complexity never grows regardless of the size of L. Obviously in your example it is doing a while loop dependent on growing a counter i in relation to the length of L. I would focus on the fact that i is being multiplied, to indicate the relationship between how long it will take to get through that while loop vs the length of L. Basically, try to compare how many loops the while loop will need to perform at various values of len(L), and then that will determine your complexity. 1 unit of time can be 1 iteration through the while loop.
Hopefully I have made some form of contribution here, with my own lack of expertise on the subject.
Update
To clarify based on the comment from ch3ka, if you were doing more than what you currently have inside your with loop, then you would also have to consider the added complexity for each loop. But because your list lookup L[i] is constant complexity, as is the math that follows it, we can ignore those in terms of the complexity.
Here's a quick-and-dirty way to find out:
import matplotlib.pyplot as plt
def f2(L):
sum = 0
i = 1
times = 0
while i < len(L):
sum = sum + L[i]
i = i * 2
times += 1 # track how many times the loop gets called
return times
def main():
i = range(1200)
f_i = [f2([1]*n) for n in i]
plt.plot(i, f_i)
if __name__=="__main__":
main()
... which results in
Horizontal axis is size of L, vertical axis is how many times the function loops; big-O should be pretty obvious from this.
Consider what happens with an input of length n=10. Now consider what happens if the input size is doubled to 20. Will the runtime double as well? Then it's linear. If the runtime grows by factor 4, then it's quadratic. Etc.
When you look at the function, you have to determine how the size of the list will affect the number of loops that will occur.
In your specific situation, lets increment n and see how many times the while loop will run.
n = 0, loop = 0 times
n = 1, loop = 1 time
n = 2, loop = 1 time
n = 3, loop = 2 times
n = 4, loop = 2 times
See the pattern? Now answer your question, does it:
(a) It grows linearly, like n does. (b) It grows quadratically, like n^2 does.
(c) It grows less than linearly. (d) It grows more than quadratically.
Checkout Hugh's answer for an empirical result :)
it's O(log(len(L))), as list lookup is a constant time operation, independant of the size of the list.