Fastest way to build a list [duplicate] - python

In python, as far as I know, there are at least 3 to 4 ways to create and initialize lists of a given size:
Simple loop with append:
my_list = []
for i in range(50):
my_list.append(0)
Simple loop with +=:
my_list = []
for i in range(50):
my_list += [0]
List comprehension:
my_list = [0 for i in range(50)]
List and integer multiplication:
my_list = [0] * 50
In these examples I don't think there would be any performance difference given that the lists have only 50 elements, but what if I need a list of a million elements? Would the use of xrange make any improvement? Which is the preferred/fastest way to create and initialize lists in python?

Let's run some time tests* with timeit.timeit:
>>> from timeit import timeit
>>>
>>> # Test 1
>>> test = """
... my_list = []
... for i in xrange(50):
... my_list.append(0)
... """
>>> timeit(test)
22.384258893239178
>>>
>>> # Test 2
>>> test = """
... my_list = []
... for i in xrange(50):
... my_list += [0]
... """
>>> timeit(test)
34.494779364416445
>>>
>>> # Test 3
>>> test = "my_list = [0 for i in xrange(50)]"
>>> timeit(test)
9.490926919482774
>>>
>>> # Test 4
>>> test = "my_list = [0] * 50"
>>> timeit(test)
1.5340533503559755
>>>
As you can see above, the last method is the fastest by far.
However, it should only be used with immutable items (such as integers). This is because it will create a list with references to the same item.
Below is a demonstration:
>>> lst = [[]] * 3
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are the same
>>> id(lst[0])
28734408
>>> id(lst[1])
28734408
>>> id(lst[2])
28734408
>>>
This behavior is very often undesirable and can lead to bugs in the code.
If you have mutable items (such as lists), then you should use the still very fast list comprehension:
>>> lst = [[] for _ in xrange(3)]
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are different
>>> id(lst[0])
28796688
>>> id(lst[1])
28796648
>>> id(lst[2])
28736168
>>>
*Note: In all of the tests, I replaced range with xrange. Since the latter returns an iterator, it should always be faster than the former.

If you want to see the dependency with the length of the list n:
Pure python
I tested for list length up to n=10000 and the behavior remains the same. So the integer multiplication method is the fastest with difference.
Numpy
For lists with more than ~300 elements you should consider numpy.
Benchmark code:
import time
def timeit(f):
def timed(*args, **kwargs):
start = time.clock()
for _ in range(100):
f(*args, **kwargs)
end = time.clock()
return end - start
return timed
#timeit
def append_loop(n):
"""Simple loop with append"""
my_list = []
for i in xrange(n):
my_list.append(0)
#timeit
def add_loop(n):
"""Simple loop with +="""
my_list = []
for i in xrange(n):
my_list += [0]
#timeit
def list_comprehension(n):
"""List comprehension"""
my_list = [0 for i in xrange(n)]
#timeit
def integer_multiplication(n):
"""List and integer multiplication"""
my_list = [0] * n
import numpy as np
#timeit
def numpy_array(n):
my_list = np.zeros(n)
import pandas as pd
df = pd.DataFrame([(integer_multiplication(n), numpy_array(n)) for n in range(1000)],
columns=['Integer multiplication', 'Numpy array'])
df.plot()
Gist here.

There is one more method which, while sounding weird, is handy in right curcumstances. If you need to produce the same list many times (initializing matrix for roguelike pathfinding and related stuff in my case), you can store a copy of the list in the tuple, then turn it to list when you need it. It is noticeably quicker than generating list via comprehensions and, unlike list multiplication, works with nested data structures.
# In class definition
def __init__(self):
self.l = [[1000 for x in range(1000)] for y in range(1000)]
self.t = tuple(self.l)
def some_method(self):
self.l = list(self.t)
self._do_fancy_computation()
# self.l is changed by this method
# Later in code:
for a in range(10):
obj.some_method()
Voila, on every iteration you have a fresh copy of the same list in no time!
Disclaimer:
I do not have a slightest idea why is this so quick or whether it works anywhere outside CPython 3.4.

If you want to create a list incrementing, i.e. adding 1 every time, use the range function. In range the start argument is included and the end argument is excluded as shown below:
list(range(10,20))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
If you want to create a list by adding 2 to previous elements use this:
list(range(10,20,2))
[10, 12, 14, 16, 18]
Here the third argument is the step size to be taken. Now you can give any start element, end element and step size and create many lists fast and easy.
Thank you..!
Happy Learning.. :)

Related

Is i = i + n truly the same as i += n? [duplicate]

This question already has answers here:
When is "i += x" different from "i = i + x" in Python?
(3 answers)
Closed 4 years ago.
One block of code works but the other does not. Which would make sense except the second block is the same as the first only with an operation written in shorthand. They are practically the same operation.
l = ['table']
i = []
Version 1
for n in l:
i += n
print(i)
Output: ['t', 'a', 'b', 'l', 'e']
Version 2
for n in l:
i = i + n
print(i)
Output:
TypeError: can only concatenate list (not "str") to list
What is causing this strange error?
They don't have to be the same.
Using the + operator calls the method __add__ while using the += operator calls __iadd__. It is completely up to the object in question what happens when one of these methods is called.
If you use x += y but x does not provide an __iadd__ method (or the method returns NotImplemented), __add__ is used as a fallback, meaning that x = x + y happens.
In the case of lists, using l += iterable actually extends the list l with the elements of iterable. In your case, every character from the string (which is an iterable) is appended during the extend operation.
Demo 1: using __iadd__
>>> l = []
>>> l += 'table'
>>> l
['t', 'a', 'b', 'l', 'e']
Demo 2: using extend does the same
>>> l = []
>>> l.extend('table')
>>> l
['t', 'a', 'b', 'l', 'e']
Demo 3: adding a list and a string raises a TypeError.
>>> l = []
>>> l = l + 'table'
[...]
TypeError: can only concatenate list (not "str") to list
Not using += gives you the TypeError here because only __iadd__ implements the extending behavior.
Demo 4: common pitfall: += does not build a new list. We can confirm this by checking for equal object identities with the is operator.
>>> l = []
>>> l_ref = l # another name for l, no data is copied here
>>> l += [1, 2, 3] # uses __iadd__, mutates l in-place
>>> l is l_ref # confirm that l and l_ref are names for the same object
True
>>> l
[1, 2, 3]
>>> l_ref # mutations are seen across all names
[1, 2, 3]
However, the l = l + iterable syntax does build a new list.
>>> l = []
>>> l_ref = l # another name for l, no data is copied here
>>> l = l + [1, 2, 3] # uses __add__, builds new list and reassigns name l
>>> l is l_ref # confirm that l and l_ref are names for different objects
False
>>> l
[1, 2, 3]
>>> l_ref
[]
In some cases, this can produce subtle bugs, because += mutates the original list, while
l = l + iterable builds a new list and reassigns the name l.
BONUS
Ned Batchelder's challenge to find this in the docs
No.
7.2.1. Augmented assignment statements:
An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the
actual operation is performed in-place, meaning that rather than
creating a new object and assigning that to the target, the old object
is modified instead.
If in the second case, you wrap a list around n to avoid errors:
for n in l:
i = i + [n]
print(i)
you get
['table']
So they are different operations.

Copy values from one list to another without altering the reference in python

In python objects such as lists are passed by reference. Assignment with the = operator assigns by reference. So this function:
def modify_list(A):
A = [1,2,3,4]
Takes a reference to list and labels it A, but then sets the local variable A to a new reference; the list passed by the calling scope is not modified.
test = []
modify_list(test)
print(test)
prints []
However I could do this:
def modify_list(A):
A += [1,2,3,4]
test = []
modify_list(test)
print(test)
Prints [1,2,3,4]
How can I assign a list passed by reference to contain the values of another list? What I am looking for is something functionally equivelant to the following, but simpler:
def modify_list(A):
list_values = [1,2,3,4]
for i in range(min(len(A), len(list_values))):
A[i] = list_values[i]
for i in range(len(list_values), len(A)):
del A[i]
for i in range(len(A), len(list_values)):
A += [list_values[i]]
And yes, I know that this is not a good way to do <whatever I want to do>, I am just asking out of curiosity not necessity.
You can do a slice assignment:
>>> def mod_list(A, new_A):
... A[:]=new_A
...
>>> liA=[1,2,3]
>>> new=[3,4,5,6,7]
>>> mod_list(liA, new)
>>> liA
[3, 4, 5, 6, 7]
The simplest solution is to use:
def modify_list(A):
A[::] = [1, 2, 3, 4]
To overwrite the contents of a list with another list (or an arbitrary iterable), you can use the slice-assignment syntax:
A = B = [1,2,3]
A[:] = [4,5,6,7]
print(A) # [4,5,6,7]
print(A is B) # True
Slice assignment is implemented on most of the mutable built-in types. The above assignment is essentially the same the following:
A.__setitem__(slice(None, None, None), [4,5,6,7])
So the same magic function (__setitem__) is called when a regular item assignment happens, only that the item index is now a slice object, which represents the item range to be overwritten. Based on this example you can even support slice assignment in your own types.

python:loop vs comprehension [duplicate]

This question already has answers here:
Are list-comprehensions and functional functions faster than "for loops"?
(8 answers)
Closed 7 years ago.
I was trying to do some simple procedures using lists.
From the book learning python I saw the method of using a comprehension.
Well I also knew that a loop could replace it.
Now I really want to know that which is faster, loop or comprehension.
These are my programs.
a = []
for x in range(1, 101):
a.append(x)
This would set a as [1, 2, 3, ......, 99, 100]
Now this is what I have done with the comprehension.
[x ** 2 for x in a]
This is what I did with the loop.
c = []
for x in a:
b=[x**2]
c+=b
Could any one say a way to find which of the above is faster.Please also try to explain that how comprehensions differ from loops.
Any help is appreciated.
You can use the timeit library, or just use time.time() to time it yourself:
>>> from time import time
>>> def first():
... ftime = time()
... _foo = [x ** 2 for x in range(1, 101)]
... print "First", time()-ftime
...
>>> def second():
... ftime = time()
... _foo = []
... for x in range(1, 101):
... _b=[x**2]
... _foo+=_b
... print "Second", time()-ftime
...
>>> first()
First 5.60283660889e-05
>>> second()
Second 8.79764556885e-05
>>> first()
First 4.88758087158e-05
>>> second()
Second 8.39233398438e-05
>>> first()
First 2.8133392334e-05
>>> second()
Second 7.29560852051e-05
>>>
Evidently, the list comprehension runs faster, by a factor of around 2 to 3.

Best and/or fastest way to create lists in python

In python, as far as I know, there are at least 3 to 4 ways to create and initialize lists of a given size:
Simple loop with append:
my_list = []
for i in range(50):
my_list.append(0)
Simple loop with +=:
my_list = []
for i in range(50):
my_list += [0]
List comprehension:
my_list = [0 for i in range(50)]
List and integer multiplication:
my_list = [0] * 50
In these examples I don't think there would be any performance difference given that the lists have only 50 elements, but what if I need a list of a million elements? Would the use of xrange make any improvement? Which is the preferred/fastest way to create and initialize lists in python?
Let's run some time tests* with timeit.timeit:
>>> from timeit import timeit
>>>
>>> # Test 1
>>> test = """
... my_list = []
... for i in xrange(50):
... my_list.append(0)
... """
>>> timeit(test)
22.384258893239178
>>>
>>> # Test 2
>>> test = """
... my_list = []
... for i in xrange(50):
... my_list += [0]
... """
>>> timeit(test)
34.494779364416445
>>>
>>> # Test 3
>>> test = "my_list = [0 for i in xrange(50)]"
>>> timeit(test)
9.490926919482774
>>>
>>> # Test 4
>>> test = "my_list = [0] * 50"
>>> timeit(test)
1.5340533503559755
>>>
As you can see above, the last method is the fastest by far.
However, it should only be used with immutable items (such as integers). This is because it will create a list with references to the same item.
Below is a demonstration:
>>> lst = [[]] * 3
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are the same
>>> id(lst[0])
28734408
>>> id(lst[1])
28734408
>>> id(lst[2])
28734408
>>>
This behavior is very often undesirable and can lead to bugs in the code.
If you have mutable items (such as lists), then you should use the still very fast list comprehension:
>>> lst = [[] for _ in xrange(3)]
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are different
>>> id(lst[0])
28796688
>>> id(lst[1])
28796648
>>> id(lst[2])
28736168
>>>
*Note: In all of the tests, I replaced range with xrange. Since the latter returns an iterator, it should always be faster than the former.
If you want to see the dependency with the length of the list n:
Pure python
I tested for list length up to n=10000 and the behavior remains the same. So the integer multiplication method is the fastest with difference.
Numpy
For lists with more than ~300 elements you should consider numpy.
Benchmark code:
import time
def timeit(f):
def timed(*args, **kwargs):
start = time.clock()
for _ in range(100):
f(*args, **kwargs)
end = time.clock()
return end - start
return timed
#timeit
def append_loop(n):
"""Simple loop with append"""
my_list = []
for i in xrange(n):
my_list.append(0)
#timeit
def add_loop(n):
"""Simple loop with +="""
my_list = []
for i in xrange(n):
my_list += [0]
#timeit
def list_comprehension(n):
"""List comprehension"""
my_list = [0 for i in xrange(n)]
#timeit
def integer_multiplication(n):
"""List and integer multiplication"""
my_list = [0] * n
import numpy as np
#timeit
def numpy_array(n):
my_list = np.zeros(n)
import pandas as pd
df = pd.DataFrame([(integer_multiplication(n), numpy_array(n)) for n in range(1000)],
columns=['Integer multiplication', 'Numpy array'])
df.plot()
Gist here.
There is one more method which, while sounding weird, is handy in right curcumstances. If you need to produce the same list many times (initializing matrix for roguelike pathfinding and related stuff in my case), you can store a copy of the list in the tuple, then turn it to list when you need it. It is noticeably quicker than generating list via comprehensions and, unlike list multiplication, works with nested data structures.
# In class definition
def __init__(self):
self.l = [[1000 for x in range(1000)] for y in range(1000)]
self.t = tuple(self.l)
def some_method(self):
self.l = list(self.t)
self._do_fancy_computation()
# self.l is changed by this method
# Later in code:
for a in range(10):
obj.some_method()
Voila, on every iteration you have a fresh copy of the same list in no time!
Disclaimer:
I do not have a slightest idea why is this so quick or whether it works anywhere outside CPython 3.4.
If you want to create a list incrementing, i.e. adding 1 every time, use the range function. In range the start argument is included and the end argument is excluded as shown below:
list(range(10,20))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
If you want to create a list by adding 2 to previous elements use this:
list(range(10,20,2))
[10, 12, 14, 16, 18]
Here the third argument is the step size to be taken. Now you can give any start element, end element and step size and create many lists fast and easy.
Thank you..!
Happy Learning.. :)

Selecting a random list element in python

I'm trying to create a function that takes two lists and selects an element at random from each of them. Is there any way to do this using the random.seed function?
You can use random.choice to pick a random element from a sequence (like a list).
If your two lists are list1 and list2, that would be:
a = random.choice(list1)
b = random.choice(list2)
Are you sure you want to use random.seed? This will initialize the random number generator in a consistent way each time, which can be very useful if you want subsequent runs to be identical but in general that is not desired. For example, the following function will always return 8, even though it looks like it should randomly choose a number between 0 and 10.
>>> def not_very_random():
... random.seed(0)
... return random.choice(range(10))
...
>>> not_very_random()
8
>>> not_very_random()
8
>>> not_very_random()
8
>>> not_very_random()
8
Note: #F.J's solution is much less complicated and better.
Use random.randint to pick a pseudo-random index from the list. Then use that index to select the element:
>>> import random as r
>>> r.seed(14) # used random number generator of ... my head ... to get 14
>>> mylist = [1,2,3,4,5]
>>> mylist[r.randint(0, len(mylist) - 1)]
You can easily extend this to work on two lists.
Why do you want to use random.seed?
Example (using Python2.7):
>>> import collections as c
>>> c.Counter([mylist[r.randint(0, len(mylist) - 1)] for x in range(200)])
Counter({1: 44, 5: 43, 2: 40, 3: 39, 4: 34})
Is that random enough?
I totally redid my previous answer. Here is a class which wraps a random-number generator (with optional seed) with the list. This is a minor improvement over F.J.'s, because it gives deterministic behavior for testing. Calling choice() on the first list should not affect the second list, and vice versa:
class rlist ():
def __init__(self, lst, rg=None, rseed=None):
self.lst = lst
if rg is not None:
self.rg = rg
else:
self.rg = random.Random()
if rseed is not None:
self.rg.seed(rseed)
def choice(self):
return self.rg.choice(self.lst)
if __name__ == '__main__':
rl1 = rlist([1,2,3,4,5], rseed=1234)
rl2 = rlist(['a','b','c','d','e'], rseed=1234)
print 'First call:'
print rl1.choice(),rl2.choice()
print 'Second call:'
print rl1.choice(),rl2.choice()

Categories