Why is subtraction faster than addition in Python?

Why is subtraction faster than addition in Python? - python

I was optimising some Python code, and tried the following experiment:
import time
start = time.clock()
x = 0
for i in range(10000000):
x += 1
end = time.clock()
print '+=',end-start
start = time.clock()
x = 0
for i in range(10000000):
x -= -1
end = time.clock()
print '-=',end-start
The second loop is reliably faster, anywhere from a whisker to 10%, depending on the system I run it on. I've tried varying the order of the loops, number of executions etc, and it still seems to work.
Stranger,
for i in range(10000000, 0, -1):
(ie running the loop backwards) is faster than
for i in range(10000000):
even when loop contents are identical.
What gives, and is there a more general programming lesson here?

I can reproduce this on my Q6600 (Python 2.6.2); increasing the range to 100000000:
('+=', 11.370000000000001)
('-=', 10.769999999999998)
First, some observations:
This is 5% for a trivial operation. That's significant.
The speed of the native addition and subtraction opcodes is irrelevant. It's in the noise floor, completely dwarfed by the bytecode evaluation. That's talking about one or two native instructions around thousands.
The bytecode generates exactly the same number of instructions; the only difference is INPLACE_ADD vs. INPLACE_SUBTRACT and +1 vs -1.
Looking at the Python source, I can make a guess. This is handled in ceval.c, in PyEval_EvalFrameEx. INPLACE_ADD has a significant extra block of code, to handle string concatenation. That block doesn't exist in INPLACE_SUBTRACT, since you can't subtract strings. That means INPLACE_ADD contains more native code. Depending (heavily!) on how the code is being generated by the compiler, this extra code may be inline with the rest of the INPLACE_ADD code, which means additions can hit the instruction cache harder than subtraction. This could be causing extra L2 cache hits, which could cause a significant performance difference.
This is heavily dependent on the system you're on (different processors have different amounts of cache and cache architectures), the compiler in use, including the particular version and compilation options (different compilers will decide differently which bits of code are on the critical path, which determines how assembly code is lumped together), and so on.
Also, the difference is reversed in Python 3.0.1 (+: 15.66, -: 16.71); no doubt this critical function has changed a lot.

$ python -m timeit -s "x=0" "x+=1"
10000000 loops, best of 3: 0.151 usec per loop
$ python -m timeit -s "x=0" "x-=-1"
10000000 loops, best of 3: 0.154 usec per loop
Looks like you've some measurement bias

I think the "general programming lesson" is that it is really hard to predict, solely by looking at the source code, which sequence of statements will be the fastest. Programmers at all levels frequently get caught up by this sort of "intuitive" optimisation. What you think you know may not necessarily be true.
There is simply no substitute for actually measuring your program performance. Kudos for doing so; answering why undoubtedly requires delving deep into the implementation of Python, in this case.
With byte-compiled languages such as Java, Python, and .NET, it is not even sufficient to measure performance on just one machine. Differences between VM versions, native code translation implementations, CPU-specific optimisations, and so on will make this sort of question ever more tricky to answer.

"The second loop is reliably faster ..."
That's your explanation right there. Re-order your script so the subtraction test is timed first, then the addition, and suddenly addition becomes the faster operation again:
-= 3.05
+= 2.84
Obviously something happens to the second half of the script that makes it faster. My guess is that the first call to range() is slower because python needs to allocate enough memory for such a long list, but it is able to re-use that memory for the second call to range():
import time
start = time.clock()
x = range(10000000)
end = time.clock()
del x
print 'first range()',end-start
start = time.clock()
x = range(10000000)
end = time.clock()
print 'second range()',end-start
A few runs of this script show that the extra time needed for the first range() accounts for nearly all of the time difference between '+=' and '-=' seen above:
first range() 0.4
second range() 0.23

It's always a good idea when asking a question to say what platform and what version of Python you are using. Sometimes it does't matter. This is NOT one of those times:
time.clock() is appropriate only on Windows. Throw away your own measuring code and use -m timeit as demonstrated in pixelbeat's answer.
Python 2.X's range() builds a list. If you are using Python 2.x, replace range with xrange and see what happens.
Python 3.X's int is Python2.X's long.

Is there a more general programming lesson here?
The more general programming lesson here is that intuition is a poor guide when predicting run-time performance of computer code.
One can reason about algorithmic complexity, hypothesise about compiler optimisations, estimate cache performance and so on. However, since these things can interact in non-trivial ways, the only way to be sure about how fast a particular piece of code is going to be is to benchmark it in the target environment (as you have rightfully done.)

With Python 2.5 the biggest problem here is using range, which will allocate a list that big to iterate over it. When using xrange, whichever is done second is a tiny bit faster for me. (Not sure if range has become a generator in Python 3.)

Your experiment is faulty. The way this experiment should be designed is to write 2 different programs - 1 for addition, 1 for subtraction. They should be exactly the same and run under the same conditions with the data being put to file. Then you need to average the runs (at least several thousand), but you'd need a statistician to tell you an appropriate number.
If you wanted to analyze different methods of addition, subtraction, and looping, again each of those should be a separate program.
Experimental error might arise from heat of processor and other activity going on the cpu, so i'd execute the runs in a variety of patterns...

That would be remarkable, so I have thoroughly evaluated your code and also setup the expiriment as I would find it more correct (all declarations and function calls outside the loop). Both versions I have run five times.
Running your code validated your claims:
-= takes constantly less time; 3.6% on average
Running my code, though, contradicts the outcome of your experiment:
+= takes on average (not always) 0.5% less time.
To show all results I have put plots online:
Your evaluation: http://bayimg.com/kadAeaAcN
My evaluation: http://bayimg.com/KadaAaAcN
So, I conclude that your experiment has a bias, and it is significant.
Finally here is my code:
import time
addtimes = [0.] * 100
subtracttimes = [0.] * 100
range100 = range(100)
range10000000 = range(10000000)
j = 0
i = 0
x = 0
start = 0.
for j in range100:
start = time.clock()
x = 0
for i in range10000000:
x += 1
addtimes[j] = time.clock() - start
for j in range100:
start = time.clock()
x = 0
for i in range10000000:
x -= -1
subtracttimes[j] = time.clock() - start
print '+=', sum(addtimes)
print '-=', sum(subtracttimes)

The running loop backwards is faster because the computer has an easier time comparing if a number is equal to 0.

Related

Python performance questions for: toggling +/-1, set instantiation, set membership check

I've been working on the following code which sort of maximizes the number of unique (in lowest common denominator) p by q blocks with some constraints. It is working perfectly. For small inputs. E.g. input 50000, output 1898.
I need to run it on numbers greater than 10^18, and while I have a different solution that gets the job done, this particular version gets super slow (made my desktop reboot at one point), and this is what my question is about.
I'm trying to figure out what is causing the slowdown in the following code, and to figure out in what order of magnitude they are slow.
The candidates for slowness:
1) the (-1)**(i+1) term? Does Python do this efficiently, or is it literally multiplying out -1 by itself a ton of times?
[EDIT: still looking for how operation.__pow__ works, but having tested setting j=-j: this is faster.]
2) set instantiation/size? Is the set getting too large? Obviously this would impact membership check if the set can't get built.
3) set membership check? This indicates O(1) behavior, although I suppose the constant continues to change.
Thanks in advance for insight into these processes.
import math
import time
a=10**18
ti=time.time()
setfrac=set([1])
x=1
y=1
k=2
while True:
k+=1
t=0
for i in xrange(1,k):
mo = math.ceil(k/2.0)+((-1)**(i+1))*(math.floor(i/2.0)
if (mo/(k-mo) not in setfrac) and (x+(k-mo) <= a and y+mo <= a):
setfrac.add(mo/(k-mo))
x+=k-mo
y+=mo
t+=1
if t==0:
break
print len(setfrac)+1
print x
print y
to=time.time()-ti
print to

Python loop slower than Excel VBA?

I ran a little test between excel (VBA) and python performing a simple loop. Code listed below. To my surprise vba was significantly faster than python. Almost 6 times faster. I though that due to the fact that python runs through the command line the performance would be better. Do you guys have any comments on this?
Python
import time
import ctypes # An included library with Python install.
start_time = time.time()
for x in range(0, 1000000):
print x
x = ("--- %s seconds ---" % (time.time() - start_time))
ctypes.windll.user32.MessageBoxA(0, x, "Your title", 1)
Excel (VBA)
Sub looptest()
Dim MyTimer As Double
MyTimer = Timer
Dim rng As Range, cell As Range
Set rng = Range("A1:A1000000")
x = 1
For Each cell In rng
cell.Value = x
x = x + 1
Next cell
MsgBox Timer - MyTimer
End Sub

Your two code samples are not doing the same thing. In the Python code, the inner loop has to:
Ask for the next number in range(0, 1000000).
Display it.
In the VBA code, Excel has to:
Ask for the next cell in Range("A1:A1000000") (which has nothing to do with Python ranges).
Set the cell.Value property.
Run through various code Excel executes whenever it changes a cell.
Check to see if any formulas need to be recalculated.
Display it.
Increment x.
Let's rewrite this so the Python and VBA loops do the same thing, as near as we can:
Python
import time
import ctypes
start_time = time.time()
x = 0
while x <= 1000000:
x = x + 1
x = ("--- %s seconds ---" % (time.time() - start_time))
ctypes.windll.user32.MessageBoxA(0, x, "Your title", 1)
VBA
Declare Function QueryPerformanceCounter Lib "kernel32" (t As Currency) As Boolean
Declare Function QueryPerformanceFrequency Lib "kernel32" (t As Currency) As Boolean
Sub looptest()
Dim StartTime As Currency
QueryPerformanceCounter StartTime
x = 0
Do While x <= 1000000
x = x + 1
Loop
Dim EndTime As Currency
QueryPerformanceCounter EndTime
Dim Frequency As Currency
QueryPerformanceFrequency Frequency
MsgBox Format$((EndTime - StartTime) / Frequency, "0.000")
End Sub
On my computer, Python takes about 96 ms, and VBA 33 ms – VBA performs three times faster. If you throw in a Dim x As Long, it performs six times faster.
Why? Well, let's look at how each gets run. Python internally compiles your .py file into a .pyc, and runs it under the Python VM. Another answer describes the Python case in detail. Excel compiles VBA into MS P-Code, and runs it under the Visual Basic VM.
At this point, it doesn't matter that python.exe is command-line and Excel is GUI. The VM runs your code, and it lives a little deeper in the bowels of your computer. Performance depends on what specific instructions are in the compiled code, and how efficiently the VM runs these instructions. In this case, the VB VM ran its P-Code faster than the Python VM ran its .pyc.

The slow part about this is the print. Printing to the console is incredibly slow, so you should totally avoid it. I assume that setting cell values in Excel is just way faster.
If you want to compare computation speed you should not have any I/O within the loop. Instead, only calculate the time it took to process the whole loop without doing anything inside (or doing something simple like adding a number or something). If you do that, you will see that Python is very fast.

It varies with the computation you need to perform and it is difficult to give a proof which is faster in a simply way, with a simple comparison. So I will share my experience in a generic way, without proofs, but with some comments on things that may generate a big difference.
In my experience, if you compare a simple for loop with exactly the same code between pure python and pure VBA, VBA is around 3 times faster. But nobody do the same in different languages. You have to apply each language best practices.
If you apply VBA best practices you can make it even faster through declaring the variables and other similar optimizations not available in python. In my experience this can make code around 2-3 times faster. So you can make VBA code around 6-9 times faster than a simple for loop in python.
On the other hand, if you apply python best practices, you won't generally write a regular for loop. You will use list comprehension or you will use numpy and scipy, which run in compiled C libraries. Those solutions are much faster than VBA code.
In general, if you perform complex matrix calculations you can do with numpy and scipy, python will be faster than VBA. In other cases, VBA is faster.
In python, you can also use Numba, which adds a bit of complexity to the code, but it helps generating compiled code and handles running it in the GPU. It will make your code even faster.
This is my experience with pure computation mainly involving internal arrays. I haven't compared performance for I/O with GUI, external files and databases, network communication or API.

while True vs while {condition}

I'm reading a book on Python3 (Introducing Python by Bill Lubanovic), and came across something I wasn't sure is a Python preference, or just a "simplification" due to being a book and trying to describe something else.
It's on how to write to a file using chunks instead of in one shot.
poem = '''There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.'''
fout = open('relativity', 'wt')
size = len(poem)
offset = 0
chunk = 100
while True:
if offset > size:
break
fout.write(poem[offset:offset+chunk])
offset += chunk
fout.close()
I was about to ask why it has while True instead of while (offset > size), but decided to try it for myself, and saw that while (offset > size) doesn't actually do anything in my Python console.
Is that just a bug in the console, or does Python really require you to move the condition inside the while loop like that? With all of the changes to make it as minimal as possible, this seems very verbose.
(I'm coming from a background in Java, C#, and JavaScript where the condition as the definition for the loop is standard.)
EDIT
Thanks to xnx's comment, I realized that I had my logic incorrect in what I would have the condition be.
This brings me back to a clearer question that I originally wanted to focus on:
Does Python prefer to do while True and have the condition use a break inside the loop, or was that just an oversight on the author's part as he tried to explain a different concept?

I was about to ask why it has while True instead of while (offset <= size), but decided to try it for myself,
This is actually how I would have written it. It should be logically equivelent.
and saw that while (offset > size) doesn't actually do anything in my Python console.
You needed to use (offset <= size), not (offset > size). The current logic stops as soon as the offset is greater than size, so reverse the condition if you want to put it in the while statement.
does Python really require you to move the condition inside the while loop like that?
No, Python allows you write write the condition in the while loop directly. Both options are fine, and it really is more a matter of personal preference in how you write your logic. I prefer the simpler form, as you were suggesting, over the original author's version.
This should work fine:
while offset <= size:
fout.write(poem[offset:offset+chunk])
offset += chunk
For details, see the documentation for while, which specifically states that any expression can be used before the :.
Edit:
Does Python prefer to do while True and have the condition use a break inside the loop, or was that just an oversight on the author's part as he tried to explain a different concept?
Python does not prefer while True:. Either version is fine, and it's completely a matter of preference for the coder. I personally prefer keeping the expression in the while statement, as I find the code more clear, concise, and maintainable using while offset <= size:.

It is legal Python code to put the conditional in the loop. Personally I think:
while offset <= size:
is clearer than:
while True:
if offset < size:
break
I prefer the first form because there's one less branch to follow but the logic is not any more complex. All other things being equal, lower levels of indentation are easier to read.
If there were multiple different conditions that would break out of the loop then it might be preferable to go for the while True syntax.
As for the observed behavior with the incorrect loop logic, consider this snippet:
size = len(poem)
offset = 0
while offset > size:
#loop code
The while loop will never be entered as offset > size starts off false.

while True:
if offset > size:
break
func(x)
is exactly equivalent to
while offset <= size:
func(x)
They both run until offset > size. It is simply a different way of phrasing it -- both are acceptable, and I'm not aware of any performance differences. They would only run differently if you had the break condition at the bottom of the while loop (i.e. after func(x))
edit:
According to the Python wiki, in Python 2.* "it slows things down a lot" to put the condition inside the while loop: "this is due to first testing the True condition for the while, then again testing" the break condition. I don't know what measure they use for "a lot", but it seems miniscule enough.

When reading from a file, you usually do
output = []
while True:
chunk = f.read(chunksize)
if len(chunk) == 0:
break
output.append(chunk)
It seems to me like the author is more used to doing reading than writing, and in this case the reading idiom of using while True has leaked through to the writing code.
As most of the folks answering the question can attest to, using simply while offset <= size is probably more Pythonic and simpler, though even simpler might be just to do
f.write(poem)
since the underlying I/O library can handle the chunked writes for you.

Does Python prefer to do while True and have the condition use a break
inside the loop, or was that just an oversight on the author's part as
he tried to explain a different concept?
No it doesn't, this is a quirk or error of the author's own.
There are situations where typical Python style (and Guido van Rossum) actively advise using while True, but this isn't one of them. That said, they don't disadvise it either. I imagine there are cases where a test would be easier to read and understand as "say when to break" than as "say when to keep going". Even though they're just logical negations of each other, one or other might make express things a little more simply:
while not god_unwilling() and not creek_risen():
while not (god_unwilling() or creek_risen()):
vs.
while True:
if god_unwilling() or creek_risen():
break
I still sort of prefer the first, but YMMV. Even better introduce functions that correspond to the English idiom: god_willing() and creek_dont_rise()
The "necessary" use is when you want to execute the break test at the end or middle of the loop (that is to say, when you want to execute part or all of the loop unconditionally the first time through). Where other languages have a greater variety of more complex loop constructs, and other examples play games with a variable to decide whether to break or not, Guido says "just use while True". Here's an example from the FAQ, for a loop that starts with an assignment:
C code:
while (line = readline(f)) {
// do something with line
}
Python code:
while True:
line = f.readline()
if not line:
break
# do something with line
The FAQ remarks (and this relates to typical Python style):
An interesting phenomenon is that most experienced Python programmers
recognize the while True idiom and don’t seem to be missing the
assignment in expression construct much; it’s only newcomers who
express a strong desire to add this to the language.
It also points out that for this particular example, you can in fact avoid the whole problem anyway with for line in f.

In my opinion, while True is better than other ways, in big programs; which has long codes. Because you cant see actually that a variable may change because of some functions or etc. while True means start if its true which means start this loop whatever happens except closed the program. So that maybe writer of the book want you used to use while True, is a little less risky than others.
It's better used to while True and define the variable which is may stop this loop.

How to make a recursive program run for a long time without getting RunTimeError in Python

This code is the recursive factorial function.
The problem is that if I want to calculate a very large number, it generates this error:
RuntimeError : maximum recursion depth exceeded
import time
def factorial (n) :
if n == 0:
return 1
else:
return n * (factorial (n -1 ) )
print " The factorial of the number is: " , factorial (1500)
time.sleep (3600)
The goal is to do with the recursive function is a factor which can calculate maximum one hour.

This is a really bad idea. Python is not at all well-suited for recursing that many times. I'd strongly recommend you switch this to a loop which checks a timer and stops when it reaches the limit.
But, if you're seriously interested in increasing the recursion limit in cython (the default depth is 1000), there's a sys setting for that, sys.setrecursionlimit. Note as it says in the documentation that "the highest possible limit is platform-dependent" - meaning there's no way to know when your program will fail. Nor is there any way you, I or cython could ever tell whether your program will recurse for something as irrelevant to the actual execution of your code as "an hour." (Just for fun, I tried this with a method that passes an int counting how many times its already recursed, and I got to 9755 before IDLE totally restarted itself.)
Here's an example of a way I think you should do this:
# be sure to import time
start_time = time.time()
counter = 1
# will execute for an hour
while time.time() < start_time + 3600:
factorial(counter) # presumably you'd want to do something with the return value here
counter += 1
You should also keep in mind that regardless of whether you use iteration or recursion, (unless you're using a separate thread) you're still going to be blocking the entire program for the entirety of the hour.

Don't do that. There is an upper limit on how deep your recursion can get. Instead, do something like this:
def factorial(n):
result = 1
for i in range(1, n+1):
result *= i
return result
Any recursive function can be rewritten to an iterative function. If your code is fancier than this, show us the actual code and we'll help you rewrite it.

Few things to note here:
You can increase recursion stack with:
import sys
sys.setrecursionlimit(someNumber) # may be 20000 or bigger.
Which will basically just increase your limit for recursion. Note that in order for it to run one hour, this number should be so unreasonably big, that it is mostly impossible. This is one of the problems with recursion and this is why people think about iterative programs.
So basically what you want is practically impossible and you would rather make it with a loop/while approach.
Moreover your sleep function does not do what you want. Sleep just forces you to wait additional time (frozing your program)

It is a guard against a stack overflow. You can change the recursion limit with sys.setrecursionlimit(newLimit)where newLimit is an integer.
Python isn't a functional language. Rewriting the algorithm iteratively, if possible, is generally a better idea.

What is the logic behind the <statement, setup> design for timeit?

I am new to Python and figured I'd play around with problems on Project Euler to have something concrete to do meanwhile.
I came across the idea of timing different solutions to see how they rate against each other. That simple task turned out to be too complicated for my taste however. I read that the time.clock() calls are not accurate enough on unix systems (seconds resolution is simply pathetic with modern processors). Thus I stumbled upon the timeit module which seems to be the first choice for profiling tasks.
I have to say I really don't understand why they went with such a counter-intuitive way to go about it. I can't seem to get it to work, without needing to rewrite/restructure my code, which I find very frustrating.
Take the code below and nevermind for a second that it's neither pretty nor particularly efficient:
import math
import sys
from timeit import Timer
def digitsum(number):
rem = 0
while number > 0:
rem += number % 10
number //= 10
return rem
def prime_form(p):
if p == 2 or p == 3 or p == 5:
return True
elif (p-1) % 6 != 0 and (p+1) % 6 != 0:
return False
elif digitsum(p) % 3 == 0:
return False
elif p % 10 == 0 or p % 10 == 5:
return False
else:
return True
def lfactor(n):
if n <= 3:
return 1
limit = int(math.sqrt(n))
if limit % 2 == 0:
limit -= 1
lfac = 1
for i in range(3,limit+1,2):
if prime_form(i):
(div,rem) = divmod(n,i)
if rem == 0:
lfac = max(lfac, max(lfactor(div) ,lfactor(i)))
return lfac if lfac != 1 else n
number = int(sys.argv[1])
t = Timer("""print lfactor(number)""", """import primefacs""")
t.timeit(100)
#print lfactor(number)
If i would like to time the line print lfactor(number) why should I go through a bunch of loops, trying to define a setup statement etc.. I understand why one would want to have debug tool that are detached from the code being tested (a la unit testing) but shouldn't there be a simple and straightforward way to get the process time of a chunk of code without much hassle (importing/defining a setup etc)? What I am thinking here is something like the way one would do that:
long t0 = System.currentTimeInMillis();
// do something
long t = System.currentTimeInMillis() - t0;
.. or even better with MATLAB, using the tic/toc commands:
tic
x = A\b;
t(n) = toc;
Hope this doesn't come across as a rant, I am really trying understand "the pythonian way of thinking" but honestly it doesn't come naturally here, not at all...

Simple, the logic behind the statement and setup is that the setup is not part of the code you want to benchmark. Normally a python module is loaded once while the functions inside it are run more than one, much more.
A pythonic way of use timeit?
$ python -m timeit -h
Tool for measuring execution time of small code snippets.
This module avoids a number of common traps for measuring execution
times. See also Tim Peters' introduction to the Algorithms chapter in
the Python Cookbook, published by O'Reilly.
Library usage: see the Timer class.
Command line usage:
python timeit.py [-n N] [-r N] [-s S] [-t] [-c] [-h] [--] [statement]
Options:
-n/--number N: how many times to execute 'statement' (default: see below)
-r/--repeat N: how many times to repeat the timer (default 3)
-s/--setup S: statement to be executed once initially (default 'pass')
-t/--time: use time.time() (default on Unix)
-c/--clock: use time.clock() (default on Windows)
-v/--verbose: print raw timing results; repeat for more digits precision
-h/--help: print this usage message and exit
--: separate options from statement, use when statement starts with -
statement: statement to be timed (default 'pass')
[cut]
$ python -m timeit -s 'from primefacs import lfactor' 'lfactor(42)'
$ # this does not work, primefacs is not binded, ie. not loaded
$ python -m timeit 'primefacts.lfactor(42)'
$ # this does not work too, lfactor is not defined
$ python -m timeit 'lfactor(42)'
$ # this works but the time to import primefacs is benchmarked too
$ # but only the first time is loaded, the successive ones the cache is used.
$ python -m timeit 'import primefacts; primefacts.lfactor(42)'
As you can see, the way timeit works is much more intuitive than you think.
Edit to add:
I read that the time.clock() calls are not accurate enough on unix
systems (seconds resolution is simply pathetic with modern
processors).
quoting the documentation:
On Unix, return the current processor time as a floating point number
expressed in seconds. The precision, and in fact the very definition
of the meaning of “processor time”, depends on that of the C function
of the same name, but in any case, this is the function to use for
benchmarking Python or timing algorithms... The resolution is
typically better than one microsecond.
going on..
I have to say I really don't understand why they went with such a
counter-intuitive way to go about it. I can't seem to get it to work,
without needing to rewrite/restructure my code, which I find very
frustrating.
Yes, it could be but then this is one of those cases where documentation can help you, here a link to the examples for the impatiens. Here a more gentle introduction to timeit.

When timing a statement, you want to time just that statement, not the setup. The setup could be considerably slower than the statement-under-test.
Note that timeit runs your statement thousands of times to get a reasonable average. It does this to eliminate the effects of OS scheduling and other processes (including but not limited to disk buffer flushing, cronjob execution, memory swapping, etc); only an average time would have any meaning when comparing different code alternatives.
For your case, just test lfactor(number) directly, and just use the timeit() function:
timeit.timeit('lfactor(number)', 'from __main__ import lfactor, number')
The setup code retrieves the lfactor() function, as well as number taken from sys.argv from the main script; the function and number won't otherwise be seen.
There is absolutely no point in performance testing the print statement, that's not what you are trying to time. Using timeit is not about seeing the result of the call, just the time it takes to run it. Since the code-under-test is run thousands of times, all you'd get is thousands of prints of (presumably) the same result.
Note that usually timeit is used to compare performance characteristics of short python snippets; to find performance bottlenecks in more complex code, use profiling instead.
If you want to time just one run, use the timeit.default_timer() function to get the most accurate timer for your platform:
timer = timeit.default_timer
start = timer()
print lfactor(number)
time_taken = timer() - start

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.