Should you always favor xrange() over range()? - python

Why or why not?

For performance, especially when you're iterating over a large range, xrange() is usually better. However, there are still a few cases why you might prefer range():
In python 3, range() does what xrange() used to do and xrange() does not exist. If you want to write code that will run on both Python 2 and Python 3, you can't use xrange().
range() can actually be faster in some cases - eg. if iterating over the same sequence multiple times. xrange() has to reconstruct the integer object every time, but range() will have real integer objects. (It will always perform worse in terms of memory however)
xrange() isn't usable in all cases where a real list is needed. For instance, it doesn't support slices, or any list methods.
[Edit] There are a couple of posts mentioning how range() will be upgraded by the 2to3 tool. For the record, here's the output of running the tool on some sample usages of range() and xrange()
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: ws_comma
--- range_test.py (original)
+++ range_test.py (refactored)
## -1,7 +1,7 ##
for x in range(20):
- a=range(20)
+ a=list(range(20))
b=list(range(20))
c=[x for x in range(20)]
d=(x for x in range(20))
- e=xrange(20)
+ e=range(20)
As you can see, when used in a for loop or comprehension, or where already wrapped with list(), range is left unchanged.

No, they both have their uses:
Use xrange() when iterating, as it saves memory. Say:
for x in xrange(1, one_zillion):
rather than:
for x in range(1, one_zillion):
On the other hand, use range() if you actually want a list of numbers.
multiples_of_seven = range(7,100,7)
print "Multiples of seven < 100: ", multiples_of_seven

You should favour range() over xrange() only when you need an actual list. For instance, when you want to modify the list returned by range(), or when you wish to slice it. For iteration or even just normal indexing, xrange() will work fine (and usually much more efficiently). There is a point where range() is a bit faster than xrange() for very small lists, but depending on your hardware and various other details, the break-even can be at a result of length 1 or 2; not something to worry about. Prefer xrange().

One other difference is that Python 2 implementation of xrange() can't support numbers bigger than C ints, so if you want to have a range using Python's built in large number support, you have to use range().
Python 2.7.3 (default, Jul 13 2012, 22:29:01)
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> range(123456787676676767676676,123456787676676767676679)
[123456787676676767676676L, 123456787676676767676677L, 123456787676676767676678L]
>>> xrange(123456787676676767676676,123456787676676767676679)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
Python 3 does not have this problem:
Python 3.2.3 (default, Jul 14 2012, 01:01:48)
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> range(123456787676676767676676,123456787676676767676679)
range(123456787676676767676676, 123456787676676767676679)

xrange() is more efficient because instead of generating a list of objects, it just generates one object at a time. Instead of 100 integers, and all of their overhead, and the list to put them in, you just have one integer at a time. Faster generation, better memory use, more efficient code.
Unless I specifically need a list for something, I always favor xrange()

range() returns a list, xrange() returns an xrange object.
xrange() is a bit faster, and a bit more memory efficient. But the gain is not very large.
The extra memory used by a list is of course not just wasted, lists have more functionality (slice, repeat, insert, ...). Exact differences can be found in the documentation. There is no bonehard rule, use what is needed.
Python 3.0 is still in development, but IIRC range() will very similar to xrange() of 2.X and list(range()) can be used to generate lists.

I would just like to say that it REALLY isn't that difficult to get an xrange object with slice and indexing functionality. I have written some code that works pretty dang well and is just as fast as xrange for when it counts (iterations).
from __future__ import division
def read_xrange(xrange_object):
# returns the xrange object's start, stop, and step
start = xrange_object[0]
if len(xrange_object) > 1:
step = xrange_object[1] - xrange_object[0]
else:
step = 1
stop = xrange_object[-1] + step
return start, stop, step
class Xrange(object):
''' creates an xrange-like object that supports slicing and indexing.
ex: a = Xrange(20)
a.index(10)
will work
Also a[:5]
will return another Xrange object with the specified attributes
Also allows for the conversion from an existing xrange object
'''
def __init__(self, *inputs):
# allow inputs of xrange objects
if len(inputs) == 1:
test, = inputs
if type(test) == xrange:
self.xrange = test
self.start, self.stop, self.step = read_xrange(test)
return
# or create one from start, stop, step
self.start, self.step = 0, None
if len(inputs) == 1:
self.stop, = inputs
elif len(inputs) == 2:
self.start, self.stop = inputs
elif len(inputs) == 3:
self.start, self.stop, self.step = inputs
else:
raise ValueError(inputs)
self.xrange = xrange(self.start, self.stop, self.step)
def __iter__(self):
return iter(self.xrange)
def __getitem__(self, item):
if type(item) is int:
if item < 0:
item += len(self)
return self.xrange[item]
if type(item) is slice:
# get the indexes, and then convert to the number
start, stop, step = item.start, item.stop, item.step
start = start if start != None else 0 # convert start = None to start = 0
if start < 0:
start += start
start = self[start]
if start < 0: raise IndexError(item)
step = (self.step if self.step != None else 1) * (step if step != None else 1)
stop = stop if stop is not None else self.xrange[-1]
if stop < 0:
stop += stop
stop = self[stop]
stop = stop
if stop > self.stop:
raise IndexError
if start < self.start:
raise IndexError
return Xrange(start, stop, step)
def index(self, value):
error = ValueError('object.index({0}): {0} not in object'.format(value))
index = (value - self.start)/self.step
if index % 1 != 0:
raise error
index = int(index)
try:
self.xrange[index]
except (IndexError, TypeError):
raise error
return index
def __len__(self):
return len(self.xrange)
Honestly, I think the whole issue is kind of silly and xrange should do all of this anyway...

Go with range for these reasons:
1) xrange will be going away in newer Python versions. This gives you easy future compatibility.
2) range will take on the efficiencies associated with xrange.

A good example given in book: Practical Python By Magnus Lie Hetland
>>> zip(range(5), xrange(100000000))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
I wouldn’t recommend using range instead of xrange in the preceding example—although
only the first five numbers are needed, range calculates all the numbers, and that may take a lot
of time. With xrange, this isn’t a problem because it calculates only those numbers needed.
Yes I read #Brian's answer: In python 3, range() is a generator anyway and xrange() does not exist.

While xrange is faster than range in most circumstances, the difference in performance is pretty minimal. The little program below compares iterating over a range and an xrange:
import timeit
# Try various list sizes.
for list_len in [1, 10, 100, 1000, 10000, 100000, 1000000]:
# Time doing a range and an xrange.
rtime = timeit.timeit('a=0;\nfor n in range(%d): a += n'%list_len, number=1000)
xrtime = timeit.timeit('a=0;\nfor n in xrange(%d): a += n'%list_len, number=1000)
# Print the result
print "Loop list of len %d: range=%.4f, xrange=%.4f"%(list_len, rtime, xrtime)
The results below shows that xrange is indeed faster, but not enough to sweat over.
Loop list of len 1: range=0.0003, xrange=0.0003
Loop list of len 10: range=0.0013, xrange=0.0011
Loop list of len 100: range=0.0068, xrange=0.0034
Loop list of len 1000: range=0.0609, xrange=0.0438
Loop list of len 10000: range=0.5527, xrange=0.5266
Loop list of len 100000: range=10.1666, xrange=7.8481
Loop list of len 1000000: range=168.3425, xrange=155.8719
So by all means use xrange, but unless you're on a constrained hardware, don't worry too much about it.

Okay, everyone here as a different opinion as to the tradeoffs and advantages of xrange versus range. They're mostly correct, xrange is an iterator, and range fleshes out and creates an actual list. For the majority of cases, you won't really notice a difference between the two. (You can use map with range but not with xrange, but it uses up more memory.)
What I think you rally want to hear, however, is that the preferred choice is xrange. Since range in Python 3 is an iterator, the code conversion tool 2to3 will correctly convert all uses of xrange to range, and will throw out an error or warning for uses of range. If you want to be sure to easily convert your code in the future, you'll use xrange only, and list(xrange) when you're sure that you want a list. I learned this during the CPython sprint at PyCon this year (2008) in Chicago.

range(): range(1, 10) returns a list from 1 to 10 numbers & hold whole list in memory.
xrange(): Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is lightly faster than range() and more memory efficient. xrange() object like an iterator and generates the numbers on demand (Lazy Evaluation).
In [1]: range(1,10)
Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]
In [2]: xrange(10)
Out[2]: xrange(10)
In [3]: print xrange.__doc__
Out[3]: xrange([start,] stop[, step]) -> xrange object
range() does the same thing as xrange() used to do in Python 3 and there is not term xrange() exist in Python 3.
range() can actually be faster in some scenario if you iterating over the same sequence multiple times. xrange() has to reconstruct the integer object every time, but range() will have real integer objects.

Related

Python generator vs list as array initializer

Here's an example of initializing an array of ten million random numbers, using a list (a), and using tuple-like generator (b). The result is exactly the same, the list or tuple is never used, so there's no practical advantage with one or the other
from random import randint
from array import array
a = array('H', [randint(1, 100) for _ in range(0, 10000000)])
b = array('H', (randint(1, 100) for _ in range(0, 10000000)))
So the question is which one to use. In principle, my understanding is that that a tuple should be able to get away with using less resources than a list, but since this list and tuple are not kept, it should be possible that the code is executed without ever initializing the intermediate data structure… My tests indicate that the list is slightly faster in this case. I can only imagine that this is because the Python implementation has more optimization around lists than tuples. Can I expect this to be consistent?
More generally, should I use one or the other, and why? (Or should I do this kind initialization some other way completely.)
Update: Answers and comments made me realize that the b example is not actually a tuple but a generator, so I edited a bit in the headline and the text above to reflect that. Also I tried splitting the list version into two lines like this, which should force the list to actually be instantiated:
g = [randint(1, 100) for _ in range(0, 10000000)]
a = array('H', g)
It appears to make no difference. The list version takes about 8.5 seconds, and the generator version takes about 9 seconds.
Although it looks like it, (randint(1, 100) for _ in range(0, 1000000)) is not a tuple, it's a generator:
>>> type((randint(1, 100) for _ in range(0, 1000000)))
<class 'generator'>
>>>
If you really want a tuple, use:
b = array('H', tuple(randint(1, 100) for _ in range(0, 1000000)))
The list being a bit faster than the generator makes sense, since the generator generates the next value when asked, one at a time, while the list comprehension allocates all the memory needed and then proceeds to fill it with values all in one go. That optimisation for speed is paid for in memory space.
I'd favour the generator, since it will work regardless of most reasonable memory restrictions and would work for any number of random numbers, while the speedup of the list is minimal. Unless you need to generate this list again and again, at which time the speedup would start to count - but then you'd probably use the same copy of the list each time to begin with.
[randint(1, 100) for _ in range(0, 10000000)]
This is a list comprehension. Every element is evaluated in a tight loop and put together into a list, so it is generally faster but takes more RAM (everything comes out at once).
(randint(1, 100) for _ in range(0, 10000000))
This is a generator expression. No element is evaluated at this point, and one of them comes out at a time when you call next() on the resulting generator. It's slower but takes a consistent (small) amount of memory.
As given in the other answer, if you want a tuple, you should convert either into one:
tuple([randint(1, 100) for _ in range(0, 10000000)])
tuple(randint(1, 100) for _ in range(0, 10000000))
Let's come back to your question:
When to use which?
In general, if you use a list comprehension or generator expression as an initializer of another sequential data structure (list, array, etc.), it makes no difference except for the memory-time tradeoff mentioned above. Things you need to consider is as simple as performance and memory budget. You would prefer the list comprehension if you need more speed (or write a C program to be absolutely fast) or the generator expression if you need to keep the memory consumption low.
If you plan to reuse the resulting sequence, things start to get interesting.
A list is strictly a list, and can for all purposes be used as a list:
a = [i for i in range(5)]
a[3] # 3
a.append(5) # a = [0, 1, 2, 3, 4, 5]
for _ in a:
print("Hello")
# Prints 6 lines in total
for _ in a:
print("Bye")
# Prints another 6 lines
b = list(reversed(a)) # b = [5, 4, 3, 2, 1, 0]
A generator can be only used once.
a = (i for i in range(5))
a[3] # TypeError: generator object isn't subscriptable
a.append(5) # AttributeError: generator has no attribute 'append'
for _ in a:
print("Hello")
# Prints 5 lines in total
for _ in a:
print("Bye")
# Nothing this time, because
# the generator has already been consumed
b = list(reversed(a)) # TypeError: generator isn't reversible
The final answer is: Know what you want to do, and find the appropriate data structure for it.

Objects of type xrange

When I'm reading xrange reference, it says like this..
Objects of type xrange are similar to buffers in that there is no specific syntax to create them, but they are created using the xrange() function. They don’t support slicing, concatenation or repetition, and using in, not in, min() or max() on them is inefficient.
However, as long as I have ever seen, all the xrange() that I have used is with in. Like for x in xrange(10): do somethings..
So why it says this way is inefficient? So what is supposed to be the right way to use xrange?
Quoting Perfomance Tips:
xrange is a generator object, basically equivalent to the following
Python 2.3 code:
def xrange(start, stop=None, step=1):
if stop is None:
stop = start
start = 0
else:
stop = int(stop)
start = int(start)
step = int(step)
while start < stop:
yield start
start += step
Except that it is implemented in pure C.
They say that in is inefficient on xrange objects because in tries to iterate over object if the __contains__ approach failed. From Membership test details:
For classes which do not define __contains__() but do define
__iter__(), x in y is true if some value z with x == z is
produced while iterating over y.
xrange does not implement __contains__ and in order to "find" element N in xrange(N + 1) in operator has to perform N iterations so
N in xrange(N + 1)
is logically equivalent to
for n in xrange(N + 1):
if n == N:
break
and it's not efficient.
not in is inefficient because in is inefficient.
Note that performance of in operator for containment tests doesn't affect the performance of the for loop. These are 2 different things.
In fact, the "in" in the grammar rule for the for loop (shown below)
for_stmt ::= "for" target_list "in" expression_list ":" suite
["else" ":" suite]
is fixed and is not an operator.
No, what they've actually meant is
>>> 5 in xrange(0, 10)
True
which is a test for "contains". It is inefficient since it has to travel through all elements in the worst case.
It is not about for loop which is correct and efficient. I suppose that the doc is a bit misleading.

Python memory error for integer and math

When I run my code below I get Memory Error
import math
X = 600851475143
halfX = math.trunc(int(X / 2))
countFactors = 0
for i in range(halfX):
if i >0 and X % i:
countFactors += 1
print countFactors
I understand because of math calcs here but I do not know how to correct it.
I'm going to guess you're using Python 2.7 (or 2.x, at any rate).
If that's the case, you should use xrange instead of range.
In python 3.x, range creates an iterator that only uses a few bytes of memory regardless of how large it is. In python 2.x, range always creates a list containing numbers counting up (or down) over the specified range. Calling range(some_large_number) can cause you to run out of memory in 2.x.
Therefore, Python 2.x has xrange which creates an iterator identical to range in 3.x.
Also, you can simplify your math somewhat. For example:
x = 600851475143
half_x = x // 2
count_factors = 0
for i in xrange(half_x):
if i > 0 and x % i == 0:
count_factors += 1
print count_factors
However, there are much more efficient ways to do this.
As a simple example, if the number is divisible by two, you can iterative over every other number, cutting the number of tests in half. Similarly, if it's divisible by 3, 5, etc.
I'll leave it to you to figure out the generalization. It's a fun problem :)

Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:
1_000_000_000_000_000 in range(1_000_000_000_000_001)
Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).
I have also tried things like this, but the calculation is still almost instant:
# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)
If I try to implement my own range function, the result is not so nice!
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
What is the range() object doing under the hood that makes it so fast?
Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.
The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.
The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.
From the range() object documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).
So at a minimum, your range() object would do:
class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1
def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step
def __len__(self):
return self.length
def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')
def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.
I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.
* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.
The fundamental misunderstanding here is in thinking that range is a generator. It's not. In fact, it's not any kind of iterator.
You can tell this pretty easily:
>>> a = range(5)
>>> print(list(a))
[0, 1, 2, 3, 4]
>>> print(list(a))
[0, 1, 2, 3, 4]
If it were a generator, iterating it once would exhaust it:
>>> b = my_crappy_range(5)
>>> print(list(b))
[0, 1, 2, 3, 4]
>>> print(list(b))
[]
What range actually is, is a sequence, just like a list. You can even test this:
>>> import collections.abc
>>> isinstance(a, collections.abc.Sequence)
True
This means it has to follow all the rules of being a sequence:
>>> a[3] # indexable
3
>>> len(a) # sized
5
>>> 3 in a # membership
True
>>> reversed(a) # reversible
<range_iterator at 0x101cd2360>
>>> a.index(3) # implements 'index'
3
>>> a.count(3) # implements 'count'
1
The difference between a range and a list is that a range is a lazy or dynamic sequence; it doesn't remember all of its values, it just remembers its start, stop, and step, and creates the values on demand on __getitem__.
(As a side note, if you print(iter(a)), you'll notice that range uses the same listiterator type as list. How does that work? A listiterator doesn't use anything special about list except for the fact that it provides a C implementation of __getitem__, so it works fine for range too.)
Now, there's nothing that says that Sequence.__contains__ has to be constant time—in fact, for obvious examples of sequences like list, it isn't. But there's nothing that says it can't be. And it's easier to implement range.__contains__ to just check it mathematically ((val - start) % step, but with some extra complexity to deal with negative steps) than to actually generate and test all the values, so why shouldn't it do it the better way?
But there doesn't seem to be anything in the language that guarantees this will happen. As Ashwini Chaudhari points out, if you give it a non-integral value, instead of converting to integer and doing the mathematical test, it will fall back to iterating all the values and comparing them one by one. And just because CPython 3.2+ and PyPy 3.x versions happen to contain this optimization, and it's an obvious good idea and easy to do, there's no reason that IronPython or NewKickAssPython 3.x couldn't leave it out. (And in fact, CPython 3.0-3.1 didn't include it.)
If range actually were a generator, like my_crappy_range, then it wouldn't make sense to test __contains__ this way, or at least the way it makes sense wouldn't be obvious. If you'd already iterated the first 3 values, is 1 still in the generator? Should testing for 1 cause it to iterate and consume all the values up to 1 (or up to the first value >= 1)?
Use the source, Luke!
In CPython, range(...).__contains__ (a method wrapper) will eventually delegate to a simple calculation which checks if the value can possibly be in the range. The reason for the speed here is we're using mathematical reasoning about the bounds, rather than a direct iteration of the range object. To explain the logic used:
Check that the number is between start and stop, and
Check that the stride value doesn't "step over" our number.
For example, 994 is in range(4, 1000, 2) because:
4 <= 994 < 1000, and
(994 - 4) % 2 == 0.
The full C code is included below, which is a bit more verbose because of memory management and reference counting details, but the basic idea is there:
static int
range_contains_long(rangeobject *r, PyObject *ob)
{
int cmp1, cmp2, cmp3;
PyObject *tmp1 = NULL;
PyObject *tmp2 = NULL;
PyObject *zero = NULL;
int result = -1;
zero = PyLong_FromLong(0);
if (zero == NULL) /* MemoryError in int(0) */
goto end;
/* Check if the value can possibly be in the range. */
cmp1 = PyObject_RichCompareBool(r->step, zero, Py_GT);
if (cmp1 == -1)
goto end;
if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}
else { /* negative steps: stop < ob <= start */
cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
}
if (cmp2 == -1 || cmp3 == -1) /* TypeError */
goto end;
if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}
/* Check that the stride does not invalidate ob's membership. */
tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);
end:
Py_XDECREF(tmp1);
Py_XDECREF(tmp2);
Py_XDECREF(zero);
return result;
}
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
The "meat" of the idea is mentioned in the comment lines:
/* positive steps: start <= ob < stop */
/* negative steps: stop < ob <= start */
/* result = ((int(ob) - start) % step) == 0 */
As a final note - look at the range_contains function at the bottom of the code snippet. If the exact type check fails then we don't use the clever algorithm described, instead falling back to a dumb iteration search of the range using _PySequence_IterSearch! You can check this behaviour in the interpreter (I'm using v3.5.0 here):
>>> x, r = 1000000000000000, range(1000000000000001)
>>> class MyInt(int):
... pass
...
>>> x_ = MyInt(x)
>>> x in r # calculates immediately :)
True
>>> x_ in r # iterates for ages.. :(
^\Quit (core dumped)
To add to Martijn’s answer, this is the relevant part of the source (in C, as the range object is written in native code):
static int
range_contains(rangeobject *r, PyObject *ob)
{
if (PyLong_CheckExact(ob) || PyBool_Check(ob))
return range_contains_long(r, ob);
return (int)_PySequence_IterSearch((PyObject*)r, ob,
PY_ITERSEARCH_CONTAINS);
}
So for PyLong objects (which is int in Python 3), it will use the range_contains_long function to determine the result. And that function essentially checks if ob is in the specified range (although it looks a bit more complex in C).
If it’s not an int object, it falls back to iterating until it finds the value (or not).
The whole logic could be translated to pseudo-Python like this:
def range_contains (rangeObj, obj):
if isinstance(obj, int):
return range_contains_long(rangeObj, obj)
# default logic by iterating
return any(obj == x for x in rangeObj)
def range_contains_long (r, num):
if r.step > 0:
# positive step: r.start <= num < r.stop
cmp2 = r.start <= num
cmp3 = num < r.stop
else:
# negative step: r.start >= num > r.stop
cmp2 = num <= r.start
cmp3 = r.stop < num
# outside of the range boundaries
if not cmp2 or not cmp3:
return False
# num must be on a valid step inside the boundaries
return (num - r.start) % r.step == 0
If you're wondering why this optimization was added to range.__contains__, and why it wasn't added to xrange.__contains__ in 2.7:
First, as Ashwini Chaudhary discovered, issue 1766304 was opened explicitly to optimize [x]range.__contains__. A patch for this was accepted and checked in for 3.2, but not backported to 2.7 because "xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late." (2.7 was nearly out at that point.)
Meanwhile:
Originally, xrange was a not-quite-sequence object. As the 3.1 docs say:
Range objects have very little behavior: they only support indexing, iteration, and the len function.
This wasn't quite true; an xrange object actually supported a few other things that come automatically with indexing and len,* including __contains__ (via linear search). But nobody thought it was worth making them full sequences at the time.
Then, as part of implementing the Abstract Base Classes PEP, it was important to figure out which builtin types should be marked as implementing which ABCs, and xrange/range claimed to implement collections.Sequence, even though it still only handled the same "very little behavior". Nobody noticed that problem until issue 9213. The patch for that issue not only added index and count to 3.2's range, it also re-worked the optimized __contains__ (which shares the same math with index, and is directly used by count).** This change went in for 3.2 as well, and was not backported to 2.x, because "it's a bugfix that adds new methods". (At this point, 2.7 was already past rc status.)
So, there were two chances to get this optimization backported to 2.7, but they were both rejected.
* In fact, you even get iteration for free with indexing alone, but in 2.3 xrange objects got a custom iterator.
** The first version actually reimplemented it, and got the details wrong—e.g., it would give you MyIntSubclass(2) in range(5) == False. But Daniel Stutzbach's updated version of the patch restored most of the previous code, including the fallback to the generic, slow _PySequence_IterSearch that pre-3.2 range.__contains__ was implicitly using when the optimization doesn't apply.
The other answers explained it well already, but I'd like to offer another experiment illustrating the nature of range objects:
>>> r = range(5)
>>> for i in r:
print(i, 2 in r, list(r))
0 True [0, 1, 2, 3, 4]
1 True [0, 1, 2, 3, 4]
2 True [0, 1, 2, 3, 4]
3 True [0, 1, 2, 3, 4]
4 True [0, 1, 2, 3, 4]
As you can see, a range object is an object that remembers its range and can be used many times (even while iterating over it), not just a one-time generator.
It's all about a lazy approach to the evaluation and some extra optimization of range.
Values in ranges don't need to be computed until real use, or even further due to extra optimization.
By the way, your integer is not such big, consider sys.maxsize
sys.maxsize in range(sys.maxsize) is pretty fast
due to optimization - it's easy to compare given integer just with min and max of range.
but:
Decimal(sys.maxsize) in range(sys.maxsize) is pretty slow.
(in this case, there is no optimization in range, so if python receives unexpected Decimal, python will compare all numbers)
You should be aware of an implementation detail but should not be relied upon, because this may change in the future.
TL;DR
The object returned by range() is actually a range object. This object implements the iterator interface so you can iterate over its values sequentially, just like a generator, list, or tuple.
But it also implements the __contains__ interface which is actually what gets called when an object appears on the right-hand side of the in operator. The __contains__() method returns a bool of whether or not the item on the left-hand side of the in is in the object. Since range objects know their bounds and stride, this is very easy to implement in O(1).
Due to optimization, it is very easy to compare given integers just with min and max range.
The reason that the range() function is so fast in Python3 is that here we use mathematical reasoning for the bounds, rather than a direct iteration of the range object.
So for explaining the logic here:
Check whether the number is between the start and stop.
Check whether the step precision value doesn't go over our number.
Take an example, 997 is in range(4, 1000, 3) because:
4 <= 997 < 1000, and (997 - 4) % 3 == 0.
Try x-1 in (i for i in range(x)) for large x values, which uses a generator comprehension to avoid invoking the range.__contains__ optimisation.
TLDR;
the range is an arithmetic series so it can very easily calculate whether the object is there. It could even get the index of it if it were list like really quickly.
__contains__ method compares directly with the start and end of the range

What is the difference between range and xrange functions in Python 2.X?

Apparently xrange is faster but I have no idea why it's faster (and no proof besides the anecdotal so far that it is faster) or what besides that is different about
for i in range(0, 20):
for i in xrange(0, 20):
In Python 2.x:
range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.
xrange is a sequence object that evaluates lazily.
In Python 3:
range does the equivalent of Python 2's xrange. To get the list, you have to explicitly use list(range(...)).
xrange no longer exists.
range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements.
xrange is a generator, so it is a sequence object is a that evaluates lazily.
This is true, but in Python 3, range() will be implemented by the Python 2 xrange(). If you need to actually generate the list, you will need to do:
list(range(1,100))
Remember, use the timeit module to test which of small snippets of code is faster!
$ python -m timeit 'for i in range(1000000):' ' pass'
10 loops, best of 3: 90.5 msec per loop
$ python -m timeit 'for i in xrange(1000000):' ' pass'
10 loops, best of 3: 51.1 msec per loop
Personally, I always use range(), unless I were dealing with really huge lists -- as you can see, time-wise, for a list of a million entries, the extra overhead is only 0.04 seconds. And as Corey points out, in Python 3.0 xrange() will go away and range() will give you nice iterator behavior anyway.
xrange only stores the range params and generates the numbers on demand. However the C implementation of Python currently restricts its args to C longs:
xrange(2**32-1, 2**32+1) # When long is 32 bits, OverflowError: Python int too large to convert to C long
range(2**32-1, 2**32+1) # OK --> [4294967295L, 4294967296L]
Note that in Python 3.0 there is only range and it behaves like the 2.x xrange but without the limitations on minimum and maximum end points.
xrange returns an iterator and only keeps one number in memory at a time. range keeps the entire list of numbers in memory.
Do spend some time with the Library Reference. The more familiar you are with it, the faster you can find answers to questions like this. Especially important are the first few chapters about builtin objects and types.
The advantage of the xrange type is that an xrange object will always
take the same amount of memory, no matter the size of the range it represents.
There are no consistent performance advantages.
Another way to find quick information about a Python construct is the docstring and the help-function:
print xrange.__doc__ # def doc(x): print x.__doc__ is super useful
help(xrange)
The doc clearly reads :
This function is very similar to range(), but returns an xrange object instead of a list. This is an opaque sequence type which yields the same values as the corresponding list, without actually storing them all simultaneously. The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break).
You will find the advantage of xrange over range in this simple example:
import timeit
t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
pass
t2 = timeit.default_timer()
print "time taken: ", (t2-t1) # 4.49153590202 seconds
t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
pass
t2 = timeit.default_timer()
print "time taken: ", (t2-t1) # 7.04547905922 seconds
The above example doesn't reflect anything substantially better in case of xrange.
Now look at the following case where range is really really slow, compared to xrange.
import timeit
t1 = timeit.default_timer()
a = 0
for i in xrange(1, 100000000):
if i == 10000:
break
t2 = timeit.default_timer()
print "time taken: ", (t2-t1) # 0.000764846801758 seconds
t1 = timeit.default_timer()
a = 0
for i in range(1, 100000000):
if i == 10000:
break
t2 = timeit.default_timer()
print "time taken: ", (t2-t1) # 2.78506207466 seconds
With range, it already creates a list from 0 to 100000000(time consuming), but xrange is a generator and it only generates numbers based on the need, that is, if the iteration continues.
In Python-3, the implementation of the range functionality is same as that of xrange in Python-2, while they have done away with xrange in Python-3
Happy Coding!!
range creates a list, so if you do range(1, 10000000) it creates a list in memory with 10000000 elements.
xrange is a generator, so it evaluates lazily.
This brings you two advantages:
You can iterate longer lists without getting a MemoryError.
As it resolves each number lazily, if you stop iteration early, you won't waste time creating the whole list.
It is for optimization reasons.
range() will create a list of values from start to end (0 .. 20 in your example). This will become an expensive operation on very large ranges.
xrange() on the other hand is much more optimised. it will only compute the next value when needed (via an xrange sequence object) and does not create a list of all values like range() does.
range(): range(1, 10) returns a list from 1 to 10 numbers & hold whole list in memory.
xrange(): Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is lightly faster than range() and more memory efficient.
xrange() object like an iterator and generates the numbers on demand.(Lazy Evaluation)
In [1]: range(1,10)
Out[1]: [1, 2, 3, 4, 5, 6, 7, 8, 9]
In [2]: xrange(10)
Out[2]: xrange(10)
In [3]: print xrange.__doc__
xrange([start,] stop[, step]) -> xrange object
range(x,y) returns a list of each number in between x and y if you use a for loop, then range is slower. In fact, range has a bigger Index range. range(x.y) will print out a list of all the numbers in between x and y
xrange(x,y) returns xrange(x,y) but if you used a for loop, then xrange is faster. xrange has a smaller Index range. xrange will not only print out xrange(x,y) but it will still keep all the numbers that are in it.
[In] range(1,10)
[Out] [1, 2, 3, 4, 5, 6, 7, 8, 9]
[In] xrange(1,10)
[Out] xrange(1,10)
If you use a for loop, then it would work
[In] for i in range(1,10):
print i
[Out] 1
2
3
4
5
6
7
8
9
[In] for i in xrange(1,10):
print i
[Out] 1
2
3
4
5
6
7
8
9
There isn't much difference when using loops, though there is a difference when just printing it!
Some of the other answers mention that Python 3 eliminated 2.x's range and renamed 2.x's xrange to range. However, unless you're using 3.0 or 3.1 (which nobody should be), it's actually a somewhat different type.
As the 3.1 docs say:
Range objects have very little behavior: they only support indexing, iteration, and the len function.
However, in 3.2+, range is a full sequence—it supports extended slices, and all of the methods of collections.abc.Sequence with the same semantics as a list.*
And, at least in CPython and PyPy (the only two 3.2+ implementations that currently exist), it also has constant-time implementations of the index and count methods and the in operator (as long as you only pass it integers). This means writing 123456 in r is reasonable in 3.2+, while in 2.7 or 3.1 it would be a horrible idea.
* The fact that issubclass(xrange, collections.Sequence) returns True in 2.6-2.7 and 3.0-3.1 is a bug that was fixed in 3.2 and not backported.
In python 2.x
range(x) returns a list, that is created in memory with x elements.
>>> a = range(5)
>>> a
[0, 1, 2, 3, 4]
xrange(x) returns an xrange object which is a generator obj which generates the numbers on demand. they are computed during for-loop(Lazy Evaluation).
For looping, this is slightly faster than range() and more memory efficient.
>>> b = xrange(5)
>>> b
xrange(5)
When testing range against xrange in a loop (I know I should use timeit, but this was swiftly hacked up from memory using a simple list comprehension example) I found the following:
import time
for x in range(1, 10):
t = time.time()
[v*10 for v in range(1, 10000)]
print "range: %.4f" % ((time.time()-t)*100)
t = time.time()
[v*10 for v in xrange(1, 10000)]
print "xrange: %.4f" % ((time.time()-t)*100)
which gives:
$python range_tests.py
range: 0.4273
xrange: 0.3733
range: 0.3881
xrange: 0.3507
range: 0.3712
xrange: 0.3565
range: 0.4031
xrange: 0.3558
range: 0.3714
xrange: 0.3520
range: 0.3834
xrange: 0.3546
range: 0.3717
xrange: 0.3511
range: 0.3745
xrange: 0.3523
range: 0.3858
xrange: 0.3997 <- garbage collection?
Or, using xrange in the for loop:
range: 0.4172
xrange: 0.3701
range: 0.3840
xrange: 0.3547
range: 0.3830
xrange: 0.3862 <- garbage collection?
range: 0.4019
xrange: 0.3532
range: 0.3738
xrange: 0.3726
range: 0.3762
xrange: 0.3533
range: 0.3710
xrange: 0.3509
range: 0.3738
xrange: 0.3512
range: 0.3703
xrange: 0.3509
Is my snippet testing properly? Any comments on the slower instance of xrange? Or a better example :-)
xrange() and range() in python works similarly as for the user , but the difference comes when we are talking about how the memory is allocated in using both the function.
When we are using range() we allocate memory for all the variables it is generating, so it is not recommended to use with larger no. of variables to be generated.
xrange() on the other hand generate only a particular value at a time and can only be used with the for loop to print all the values required.
range generates the entire list and returns it. xrange does not -- it generates the numbers in the list on demand.
xrange uses an iterator (generates values on the fly), range returns a list.
What?
range returns a static list at runtime.
xrange returns an object (which acts like a generator, although it's certainly not one) from which values are generated as and when required.
When to use which?
Use xrange if you want to generate a list for a gigantic range, say 1 billion, especially when you have a "memory sensitive system" like a cell phone.
Use range if you want to iterate over the list several times.
PS: Python 3.x's range function == Python 2.x's xrange function.
Everyone has explained it greatly. But I wanted it to see it for myself. I use python3. So, I opened the resource monitor (in Windows!), and first, executed the following command first:
a=0
for i in range(1,100000):
a=a+i
and then checked the change in 'In Use' memory. It was insignificant.
Then, I ran the following code:
for i in list(range(1,100000)):
a=a+i
And it took a big chunk of the memory for use, instantly. And, I was convinced.
You can try it for yourself.
If you are using Python 2X, then replace 'range()' with 'xrange()' in the first code and 'list(range())' with 'range()'.
From the help docs.
Python 2.7.12
>>> print range.__doc__
range(stop) -> list of integers
range(start, stop[, step]) -> list of integers
Return a list containing an arithmetic progression of integers.
range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
When step is given, it specifies the increment (or decrement).
For example, range(4) returns [0, 1, 2, 3]. The end point is omitted!
These are exactly the valid indices for a list of 4 elements.
>>> print xrange.__doc__
xrange(stop) -> xrange object
xrange(start, stop[, step]) -> xrange object
Like range(), but instead of returning a list, returns an object that
generates the numbers in the range on demand. For looping, this is
slightly faster than range() and more memory efficient.
Python 3.5.2
>>> print(range.__doc__)
range(stop) -> range object
range(start, stop[, step]) -> range object
Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
>>> print(xrange.__doc__)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'xrange' is not defined
Difference is apparent. In Python 2.x, range returns a list, xrange returns an xrange object which is iterable.
In Python 3.x, range becomes xrange of Python 2.x, and xrange is removed.
range() in Python 2.x
This function is essentially the old range() function that was available in Python 2.x and returns an instance of a list object that contains the elements in the specified range.
However, this implementation is too inefficient when it comes to initialise a list with a range of numbers. For example, for i in range(1000000) would be a very expensive command to execute, both in terms of memory and time usage as it requires the storage of this list into the memory.
range() in Python 3.x and xrange() in Python 2.x
Python 3.x introduced a newer implementation of range() (while the newer implementation was already available in Python 2.x through the xrange() function).
The range() exploits a strategy known as lazy evaluation. Instead of creating a huge list of elements in range, the newer implementation introduces the class range, a lightweight object that represents the required elements in the given range, without storing them explicitly in memory (this might sound like generators but the concept of lazy evaluation is different).
As an example, consider the following:
# Python 2.x
>>> a = range(10)
>>> type(a)
<type 'list'>
>>> b = xrange(10)
>>> type(b)
<type 'xrange'>
and
# Python 3.x
>>> a = range(10)
>>> type(a)
<class 'range'>
On a requirement for scanning/printing of 0-N items , range and xrange works as follows.
range() - creates a new list in the memory and takes the whole 0 to N items(totally N+1) and prints them.
xrange() - creates a iterator instance that scans through the items and keeps only the current encountered item into the memory , hence utilising same amount of memory all the time.
In case the required element is somewhat at the beginning of the list only then it saves a good amount of time and memory.
Range returns a list while xrange returns an xrange object which takes the same memory irrespective of the range size,as in this case,only one element is generated and available per iteration whereas in case of using range, all the elements are generated at once and are available in the memory.
The difference decreases for smaller arguments to range(..) / xrange(..):
$ python -m timeit "for i in xrange(10111):" " for k in range(100):" " pass"
10 loops, best of 3: 59.4 msec per loop
$ python -m timeit "for i in xrange(10111):" " for k in xrange(100):" " pass"
10 loops, best of 3: 46.9 msec per loop
In this case xrange(100) is only about 20% more efficient.
range :-range will populate everything at once.which means every number of the range will occupy the memory.
xrange :-xrange is something like generator ,it will comes into picture when you want the range of numbers but you dont want them to be stored,like when you want to use in for loop.so memory efficient.
Additionally, if do list(xrange(...)) will be equivalent to range(...).
So list is slow.
Also xrange really doesn't fully finish the sequence
So that's why its not a list, it's a xrange object
See this post to find difference between range and xrange:
To quote:
range returns exactly what you think: a list of consecutive
integers, of a defined length beginning with 0. xrange, however,
returns an "xrange object", which acts a great deal like an iterator

Categories