What's the difference between "while 1" and "while True"? - python

I've seen two ways to create an infinite loop in Python:
while 1:
do_something()
while True:
do_something()
Is there any difference between these? Is one more pythonic than the other?

Fundamentally it doesn't matter, such minutiae doesn't really affect whether something is 'pythonic' or not.
If you're interested in trivia however, there are some differences.
The builtin boolean type didn't exist till Python 2.3 so code that was intended to run on ancient versions tends to use the while 1: form. You'll see it in the standard library, for instance.
The True and False builtins are not reserved words prior to Python 3 so could be assigned to, changing their value. This helps with the case above because code could do True = 1 for backwards compatibility, but means that the name True needs to be looked up in the globals dictionary every time it is used.
Because of the above restriction, the bytecode the two versions compile to is different in Python 2 as there's an optimisation for constant integers that it can't use for True. Because Python can tell when compiling the 1 that it's always non-zero, it removes the conditional jump and doesn't load the constant at all:
>>> import dis
>>> def while_1():
... while 1:
... pass
...
>>> def while_true():
... while True:
... pass
...
>>> dis.dis(while_1)
2 0 SETUP_LOOP 5 (to 8)
3 >> 3 JUMP_ABSOLUTE 3
6 POP_TOP
7 POP_BLOCK
>> 8 LOAD_CONST 0 (None)
11 RETURN_VALUE
>>> dis.dis(while_true)
2 0 SETUP_LOOP 12 (to 15)
>> 3 LOAD_GLOBAL 0 (True)
6 JUMP_IF_FALSE 4 (to 13)
9 POP_TOP
3 10 JUMP_ABSOLUTE 3
>> 13 POP_TOP
14 POP_BLOCK
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
So, while True: is a little easier to read, and while 1: is a bit kinder to old versions of Python. As you're unlikely to need to run on Python 2.2 these days or need to worry about the bytecode count of your loops, the former is marginally preferable.

The most pythonic way will always be the most readable. Use while True:

It doesn't really matter. Neither is hard to read or understand, though personally I'd always use while True, which is a bit more explicit.
More generally, a whole lot of while–break loops people write in Python could be something else. Sometimes I see people write i = 0; while True: i += 1 ..., which can be replaced with for i in itertools.count() and people writing while True: foo = fun() if foo is None: break when this can be written for foo in iter(fun, None), which requires learning but has less boilerplate and opportunity for silly mistakes.

Neither.
Both of them mean I have to scan the code looking for the break, instead of being able to see the stop condition right where it belongs.
I try to avoid this kind of thing wherever possible, and if it's not possible, let the code speak for itself like this:
while not found_answer:
check_number += 1
if check_number == 42:
found_answer = True
Edit: It seems that the word "avoid" above wasn't clear enough. Using a basically infinite loop and leaving it from somewhere within the loop (using break) should usually be avoided altogether. Sometimes that isn't possible. In that case, I like to use something like the code above, which, however, still represents the same concept – the above code is nothing more than a compromise – but at least, I can show the purpose of the loop at the beginning – just like I wouldn't call a function do_something_with_args(*args).

IMO the second option is more obvious.
If you could get rid of the while and write more compact code, that might be more pythonic.
For example:
# Get the even numbers in the range 1..10
# Version 1
l = []
n = 1
while 1:
if n % 2 == 0: l.append(n)
n += 1
if n > 10: break
print l
# Version 2
print [i for i in range(1, 11) if i % 2 == 0]
# Version 3
print range(2, 11, 2)

I think this is mostly a matter of style. Both should be easily understandable as an infinite loop.
However, personally I prefer the second option. That's because it just takes a mental micro-step less to understand, especially for programmers without C background.

The first one will work also in those early versions where True is not yet defined.

If you have an algorithm that is suppose to terminate in a finite time, I would recommend this, which is always safer than while True:
maxiter = 1000
for i in xrange(maxiter):
# your code
# on success:
break
else:
# that algorithm has not finished in maxiter steps! do something accordingly

I believe the second expression is more explicit, and thus more pythonic.

This is only a matter of style, any programming beginner will understand either option.
But the second option will only work if True wasn't assigned to False, which was possible until Python 3:
>>> True = False
>>> True
False

The better way is "while True" with a conditional break out of the loop.

Related

Nested if statements or And

Is a nested statement or the use of and in an if statement would be more efficient?
Basically is:
if condition1:
if condition2:
#do stuff
or
if condition1 and condition2:
#do stuff
more efficient or are they similar on performance and for the sake of readability which should I choose to use
Short answer: no
dis disassemblers python module to bytecode that being executed by interpreter. For both functions number of executed operations is the same (10)
But generally preferred to use and clause because of better code readability and less nesting
Code:
>>> def test_inner_if(condition1=True, condition2=True):
... if condition1:
... if condition2:
... pass
...
>>> def test_and_if(condition1=True, condition2=True):
... if condition1 and condition2:
... pass
...
>>> import dis
>>> dis.dis(test_inner_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
3 4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
4 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> dis.dis(test_and_if)
2 0 LOAD_FAST 0 (condition1)
2 POP_JUMP_IF_FALSE 8
4 LOAD_FAST 1 (condition2)
6 POP_JUMP_IF_FALSE 8
3 >> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
2nd one is generally preferred. In terms of run time they wouldn't make really any noticeable difference, but in terms of readability of code (good code is easily readable) the second one is preferred.
Just go for readability
If you're trying to do something easy like x>3 and x<30 just use on if.
If your conditions are function calls probably use multiple ifs as it will be easier to read and easier to debug
The best piece of advice I have gotten in terms of optimization is,
"find the slowest part of your code and optimize the heck out of that"
The context of the if statements matter, so for example in one case they can be deep inside 3 or more for loops, then going for good optimization (and comment you logic) would be good. In another case though, you may determining whether to throw an error at the beginning of a function. In that case readability is critical.
Secondly, how you optimize is important too. The interpreter way see both way as equivalent which means that it is best to go for readability. One easy way to find out is to use this
import time
s = time.clock()
#your code here
print(time.clock() - s) #shows how long the code segment took to run
This can be an interesting experiment whenever you have an optimization question.

What does the power operator (**) in Python translate into?

In other words, what exists behind the two asterisks? Is it simply multiplying the number x times or something else?
As a follow-up question, is it better to write 2**3 or 2*2*2. I'm asking because I've heard that in C++ it's better to not use pow() for simple calculations, since it calls a function.
If you're interested in the internals, I'd disassemble the instruction to get the CPython bytecode it maps to. Using Python3:
»»» def test():
return 2**3
...:
»»» dis.dis(test)
2 0 LOAD_CONST 3 (8)
3 RETURN_VALUE
OK, so that seems to have done the calculation right on entry, and stored the result. You get exactly the same CPython bytecode for 2*2*2 (feel free to try it). So, for the expressions that evaluate to a constant, you get the same result and it doesn't matter.
What if you want the power of a variable?
Now you get two different bits of bytecode:
»»» def test(n):
return n ** 3
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (3)
6 BINARY_POWER
7 RETURN_VALUE
vs.
»»» def test(n):
return n * 2 * 2
....:
»»» dis.dis(test)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MULTIPLY
7 LOAD_CONST 1 (2)
10 BINARY_MULTIPLY
11 RETURN_VALUE
Now the question is of course, is the BINARY_MULTIPLY quicker than the BINARY_POWER operation?
The best way to try that is to use timeit. I'll use the IPython %timeit magic. Here's the output for multiplication:
%timeit test(100)
The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 163 ns per loop
and for power
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 473 ns per loop
You may wish to repeat this for representative inputs, but empirically it looks like the multiplication is quicker (but note the mentioned caveat about the variance in the output).
If you want further internals, I'd suggest digging into the CPython code.
While the second one is little bit faster for numbers, the advantage is very low compared to the first: readability. If you are going for time, and you are pressured to make such optimizations, then python probably is not the language you should use.
Note: for values other than numbers:
a ** b translates to
a.__pow__(b)
whereas a * a * a is a call to
a.__mul__(a.__mul__(a))
Test Code:
import time
s = time.time()
for x in xrange(1,1000000):
x**5
print "done in ", time.time() - s
s = time.time()
for x in xrange(1,1000000):
x*x*x*x*x
print "done in ", time.time() - s
For my machine it yields:
done in 0.975429058075
done in 0.260419845581
[Finished in 1.2s]
If you ask frankly, multiplication is a bit faster.
>>timeit.timeit('[i*i*i*i for i in range(100)]', number=10000)
0.262529843304
>>timeit.timeit('[i**4 for i in range(100)]', number=10000)
0.31143438383
But, speed isn't the only thing to consider when you will choose one from two options. From example, what is easier while computing 2 to the power 20? Simply writing 2**20 or using for loop that will iterate 20 times and do some multiplication task?
The ** operator will, internally, use an iterative function (same semantics as built-in pow() (Python docs), which likely means it just calls that function anyway).
Therefore, if you know the power and can hardcode it, using 2*2*2 would likely be a little faster than 2**3. This has a little to do with the function, but I believe the main performance issue is that it will use a loop.
Note that it's still quite silly to replace more readable code for less readable code when it's something as simple as 2**3, the performance gain is minimal at best.
From the docs:
The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is:
power ::= primary ["**" u_expr]
Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): -1**2 results in -1.
The power operator has the same semantics as the built-in pow() function, when called with two arguments: it yields its left argument raised to the power of its right argument.
This means that, in Python: 2**2**3 is evaluated as 2**(2**3) = 2**8 = 256.
In mathematics, stacked exponents are applied from the top down. If it were not done this way you would just get multiplication of exponents:
(((2**3)**4)**5) = 2**(3*4*5)
It might be a little faster just to do the multiplication, but much less readable.

Yield only as many are required from a generator

I wish to yield from a generator only as many items are required.
In the following code
a, b, c = itertools.count()
I receive this exception:
ValueError: too many values to unpack
I've seen several related questions, however I have zero interest in the remaining items from the generator, I only wish to receive as many as I ask for, without providing that quantity in advance.
It seems to me that Python determines the number of items you want, but then proceeds to try to read and store more than that number (generating ValueError).
How can I yield only as many items as I require, without passing in how many items I want?
Update0
To aid in understanding the approximate behaviour I believe should be possible, here's a code sample:
def unpack_seq(ctx, items, seq):
for name in items:
setattr(ctx, name, seq.next())
import itertools, sys
unpack_seq(sys.modules[__name__], ["a", "b", "c"], itertools.count())
print a, b, c
If you can improve this code please do.
Alex Martelli's answer suggests to me the byte op UNPACK_SEQUENCE is responsible for the limitation. I don't see why this operation should require that generated sequences must also exactly match in length.
Note that Python 3 has different unpack syntaxes which probably invalidate technical discussion in this question.
You need a deep bytecode hack - what you request cannot be done at Python source code level, but (if you're willing to commit to a specific version and release of Python) may be doable by post-processing the bytecode after Python has compiled it. Consider, e.g.:
>>> def f(someit):
... a, b, c = someit()
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (someit)
3 CALL_FUNCTION 0
6 UNPACK_SEQUENCE 3
9 STORE_FAST 1 (a)
12 STORE_FAST 2 (b)
15 STORE_FAST 3 (c)
18 LOAD_CONST 0 (None)
21 RETURN_VALUE
>>>
As you see, the UNPACK_SEQUENCE 3 bytecode occurs without the iterator someit being given the least indication of it (the iterator has already been called!) -- so you must prefix it in the bytecode with a "get exactly N bytes" operation, e.g.:
>>> def g(someit):
... a, b, c = limit(someit(), 3)
...
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (limit)
3 LOAD_FAST 0 (someit)
6 CALL_FUNCTION 0
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 2
15 UNPACK_SEQUENCE 3
18 STORE_FAST 1 (a)
21 STORE_FAST 2 (b)
24 STORE_FAST 3 (c)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
where limit is your own function, easily implemented (e.g via itertools.slice). So, the original 2-bytecode "load fast; call function" sequence (just before the unpack-sequence bytecode) must become this kind of 5-bytecode sequence, with a load-global bytecode for limit before the original sequence, and the load-const; call function sequence after it.
You can of course implement this bytecode munging in a decorator.
Alternatively (for generality) you could work on altering the function's original source, e.g. via parsing and alteration of the AST, and recompiling to byte code (but that does require the source to be available at decoration time, of course).
Is this worth it for production use? Absolutely not, of course -- ridiculous amount of work for a tiny "syntax sugar" improvement. However, it can be an instructive project to pursue to gain mastery in bytecode hacking, ast hacking, and other black magic tricks that you will probably never need but are surely cool to know when you want to move beyond the language of mere language wizard to that of world-class guru -- indeed I suspect that those who are motivated to become gurus are typically people who can't fail to yield to the fascination of such "language internals' mechanics"... and those who actually make it to that exalted level are the subset wise enough to realize such endeavors are "just play", and pursue them as a spare-time activity (mental equivalent of weight lifting, much like, say, sudoku or crosswords;-) without letting them interfere with the important tasks (delivering value to users by deploying solid, clear, simple, well-performing, well-tested, well-documented code, most often without even the slightest hint of black magic to it;-).
You need to make sure that the number of items on both sides matches. One way is to use islice from the itertools module:
from itertools import islice
a, b, c = islice(itertools.count(), 3)
Python does not work the way you desire. In any assignment, like
a, b, c = itertools.count()
the right-hand side is evaluated first, before the left-hand side.
The right-hand side can not know how many items are on the left-hand side unless you tell it.
Or use a list comprehension, since you know the desired number of items:
ic = itertools.count()
a,b,c = [ic.next() for i in range(3)]
or even simpler:
a,b,c = range(3)

Python `if x is not None` or `if not x is None`? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I've always thought of the if not x is None version to be more clear, but Google's style guide and PEP-8 both use if x is not None. Are there any minor performance differences (I'm assuming not), and is there any case where one really doesn't fit (making the other a clear winner for my convention)?*
*I'm referring to any singleton, rather than just None.
...to compare singletons like
None. Use is or is not.
There's no performance difference, as they compile to the same bytecode:
>>> import dis
>>> dis.dis("not x is None")
1 0 LOAD_NAME 0 (x)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
>>> dis.dis("x is not None")
1 0 LOAD_NAME 0 (x)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
Stylistically, I try to avoid not x is y, a human reader might misunderstand it as (not x) is y. If I write x is not y then there is no ambiguity.
Both Google and Python's style guide is the best practice:
if x is not None:
# Do something about x
Using not x can cause unwanted results.
See below:
>>> x = 1
>>> not x
False
>>> x = [1]
>>> not x
False
>>> x = 0
>>> not x
True
>>> x = [0] # You don't want to fall in this one.
>>> not x
False
You may be interested to see what literals are evaluated to True or False in Python:
Truth Value Testing
Edit for comment below:
I just did some more testing. not x is None doesn't negate x first and then compared to None. In fact, it seems the is operator has a higher precedence when used that way:
>>> x
[0]
>>> not x is None
True
>>> not (x is None)
True
>>> (not x) is None
False
Therefore, not x is None is just, in my honest opinion, best avoided.
More edit:
I just did more testing and can confirm that bukzor's comment is correct. (At least, I wasn't able to prove it otherwise.)
This means if x is not None has the exact result as if not x is None. I stand corrected. Thanks bukzor.
However, my answer still stands: Use the conventional if x is not None. :]
Code should be written to be understandable to the programmer first, and the compiler or interpreter second. The "is not" construct resembles English more closely than "not is".
Python if x is not None or if not x is None?
TLDR: The bytecode compiler parses them both to x is not None - so for readability's sake, use if x is not None.
Readability
We use Python because we value things like human readability, useability, and correctness of various paradigms of programming over performance.
Python optimizes for readability, especially in this context.
Parsing and Compiling the Bytecode
The not binds more weakly than is, so there is no logical difference here. See the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
The is not is specifically provided for in the Python grammar as a readability improvement for the language:
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
And so it is a unitary element of the grammar as well.
Of course, it is not parsed the same:
>>> import ast
>>> ast.dump(ast.parse('x is not None').body[0].value)
"Compare(left=Name(id='x', ctx=Load()), ops=[IsNot()], comparators=[Name(id='None', ctx=Load())])"
>>> ast.dump(ast.parse('not x is None').body[0].value)
"UnaryOp(op=Not(), operand=Compare(left=Name(id='x', ctx=Load()), ops=[Is()], comparators=[Name(id='None', ctx=Load())]))"
But then the byte compiler will actually translate the not ... is to is not:
>>> import dis
>>> dis.dis(lambda x, y: x is not y)
1 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 COMPARE_OP 9 (is not)
9 RETURN_VALUE
>>> dis.dis(lambda x, y: not x is y)
1 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 COMPARE_OP 9 (is not)
9 RETURN_VALUE
So for the sake of readability and using the language as it was intended, please use is not.
To not use it is not wise.
The answer is simpler than people are making it.
There's no technical advantage either way, and "x is not y" is what everybody else uses, which makes it the clear winner. It doesn't matter that it "looks more like English" or not; everyone uses it, which means every user of Python--even Chinese users, whose language Python looks nothing like--will understand it at a glance, where the slightly less common syntax will take a couple extra brain cycles to parse.
Don't be different just for the sake of being different, at least in this field.
Personally, I use
if not (x is None):
which is understood immediately without ambiguity by every programmer, even those not expert in the Python syntax.
The is not operator is preferred over negating the result of is for stylistic reasons. "if x is not None:" reads just like English, but "if not x is None:" requires understanding of the operator precedence and does not read like english.
If there is a performance difference my money is on is not, but this almost certainly isn't the motivation for the decision to prefer that technique. It would obviously be implementation-dependent. Since is isn't overridable, it should be easy to optimise out any distinction anyhow.
if not x is None is more similar to other programming languages, but if x is not None definitely sounds more clear (and is more grammatically correct in English) to me.
That said it seems like it's more of a preference thing to me.
I would prefer the more readable form x is not y
than I would think how to eventually write the code handling precedence of the operators in order to produce much more readable code.

Python: How expensive is to create a small list many times?

I encounter the following small annoying dilemma over and over again in Python:
Option 1:
cleaner but slower(?) if called many times since a_list get re-created for each call of do_something()
def do_something():
a_list = ["any", "think", "whatever"]
# read something from a_list
Option 2:
Uglier but more efficient (spare the a_list creation all over again)
a_list = ["any", "think", "whatever"]
def do_something():
# read something from a_list
What do you think?
What's ugly about it?
Are the contents of the list always constants, as in your example? If so: recent versions of Python (since 2.4) will optimise that by evaluating the constant expression and keeping the result but only if it's a tuple. So you could change it to being a tuple. Or you could stop worrying about small things like that.
Here's a list of constants and a tuple of constants:
>>> def afunc():
... a = ['foo', 'bar', 'zot']
... b = ('oof', 'rab', 'toz')
... return
...
>>> import dis; dis.dis(afunc)
2 0 LOAD_CONST 1 ('foo')
3 LOAD_CONST 2 ('bar')
6 LOAD_CONST 3 ('zot')
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_CONST 7 (('oof', 'rab', 'toz'))
18 STORE_FAST 1 (b)
4 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>>
Never create something more than once if you don't have to. This is a simply optimization that can be done on your part and I personally do not find the second example ugly at all.
Some may argue not to worry about optimizing little things like this but I feel that something this simple to fix should be done immediately. I would hate to see your application create multiple copies of anything that it doesn't need to simply to preserve an arbitrary sense of "code beauty". :)
Option 3:
def do_something(a_list = ("any", "think", "whatever")):
read something from a_list
Option 3 compared to Option 1:
Both are equally readable in my opinion (though some seem to think differently in the comments! :-) ). You could even write Option 3 like this
def do_something(
a_list = ("any", "think", "whatever")):
read something from a_list
which really minimizes the difference in terms of readability.
Unlike Option 1, however, Option 3 defines a_list only once -- at the time when do_something is defined. That's exactly what we want.
Option 3 compared to Option 2:
Avoid global variables if possible. Option 3 allows you to do that.
Also, with Option 2, over time or if other people maintain this code, the definition of a_list could get separated from def do_something. This may not be a big deal, but I think it is somewhat undesireable.
if your a_list doesn't change, move it out of the function.
You have some data
You have a method associated with it
You don't want to keep the data globally just for the sake of optimising the speed of the method unless you have to.
I think this is what classes are for.
class Processor:
def __init__(this):
this.data = "any thing whatever".split()
def fun(this,arg):
# do stuff with arg and list
inst = Processor()
inst.fun("skippy)
Also, if you someday want to separate out the data into a file, you can just modify the constructor to do so.
Well it seems it comes down to initializing the array in the function or not:
import time
def fun1():
a = ['any', 'think', 'whatever']
sum = 0
for i in range(100):
sum += i
def fun2():
sum = 0
for i in range(100):
sum += i
def test_fun(fun, times):
start = time.time()
for i in range(times):
fun()
end=time.time()
print "Function took %s" % (end-start)
# Test
print 'warming up'
test_fun(fun1, 100)
test_fun(fun2, 100)
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)
print 'Again'
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)
and the results:
>python test.py
warming up
Function took 0.000604152679443
Function took 0.000600814819336
Testing fun1
Function took 0.597407817841
Testing fun2
Function took 0.580779075623
Again
Testing fun1
Function took 0.595198154449
Testing fun2
Function took 0.580571889877
Looks like there is no difference.
I've worked on automated systems that process 100,000,000+ records a day, where a 1% percent performance improvement is huge.
I learned a big lesson working on that system: Faster is better, but only when you know when it's fast enough.
A 1% improvement would have been a huge reduction in total processing time, but it isn't enough to effect when we would need our next hardware upgrade. My application was so fast, that the amount of time I spent trying to milk that last 1% probably cost more than a new server would have.
In your case, you would have to call do_something tens of thousands of times before making a significant difference in performance. In some cases that would make a difference, in other it won't.
If the list is never modified, why do you use lists at all?
Without knowing your actual requirements, I'd recommend to simply use some if-statements to get rid of the list and the "read something from list" part completely.

Categories