Various list concatenation method and their performance

Various list concatenation method and their performance - python

I was working on an algorithm and in that, we are trying to write every line in the code such that it adds up a good performance to the final code.
In one situation we have to add lists (more than two specifically). I know some of the ways to join more than two lists also I have looked upon StackOverflow but none of the answers are giving account on the performance of the method.
Can anyone show, what are the ways we can join more than two lists and their respective performance?
Edit : The size of the list is varying from 2 to 13 (to be specific).
Edit Duplicate : I have been specifically asking for the ways we can add and their respected questions and in duplicate question its limited to only 4 methods

There are multiples ways using which you can join more than two list.
Assuming that we have three list,
a = ['1']
b = ['2']
c = ['3']
Then, for joining two or more lists in python,
1)
You can simply concatenate them,
output = a + b + c
2)
You can do it using list comprehension as well,
res_list = [y for x in [a,b,c] for y in x]
3)
You can do it using extend() as well,
a.extend(b)
a.extend(c)
print(a)
4)
You can do it by using * operator as well,
res = [*a,*b,*c]
For calculating performance, I have used timeit module present in python.
The performance of the following methods are;
4th method < 1st method < 3rd method < 2nd [method on the basis of
time]
That means If you are going to use " * operator " for concatenation of more than two lists then you will get the best performance.
Hope you got what you were looking for.
Edit:: Image showing performance of all the methods (Calculated using timeit)

I did some simple measurements, here are my results:
import timeit
from itertools import chain
a = [*range(1, 10)]
b = [*range(1, 10)]
c = [*range(1, 10)]
tests = ("""output = list(chain(a, b, c))""",
"""output = a + b + c""",
"""output = [*chain(a, b, c)]""",
"""output = a.copy();output.extend(b);output.extend(c);""",
"""output = [*a, *b, *c]""",
"""output = a.copy();output+=b;output+=c;""",
"""output = a.copy();output+=[*b, *c]""",
"""output = a.copy();output += b + c""")
results = sorted((timeit.timeit(stmt=test, number=1, globals=globals()), test) for test in tests)
for i, (t, stmt) in enumerate(results, 1):
print(f'{i}.\t{t}\t{stmt}')
Prints on my machine (AMD 2400G, Python 3.6.7):
1. 6.010000106471125e-07 output = [*a, *b, *c]
2. 7.109999842214165e-07 output = a.copy();output += b + c
3. 7.720000212430023e-07 output = a.copy();output+=b;output+=c;
4. 7.820001428626711e-07 output = a + b + c
5. 1.0520000159885967e-06 output = a.copy();output+=[*b, *c]
6. 1.4030001693754457e-06 output = a.copy();output.extend(b);output.extend(c);
7. 1.4820000160398195e-06 output = [*chain(a, b, c)]
8. 2.525000127207022e-06 output = list(chain(a, b, c))

If you are going to concatenate a variable number of lists together, your input is going to be a list of lists (or some equivalent collection). The performance tests need to take this into account because you are not going to be able to do things like list1+list2+list3.
Here are some test results (1000 repetitions):
option1 += loop 0.00097 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option2 itertools.chain 0.00138 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option3 functools.reduce 0.00174 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option4 comprehension 0.00188 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option5 extend loop 0.00127 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option6 deque 0.00180 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
This would indicate that a += loop through the list of lists is the fastest approach
And the source to produce them:
allLists = [ list(range(10)) for _ in range(5) ]
def option1():
result = allLists[0].copy()
for lst in allLists[1:]:
result += lst
return result
from itertools import chain
def option2(): return list(chain(*allLists))
from functools import reduce
def option3():
return list(reduce(lambda a,b:a+b,allLists))
def option4(): return [ e for l in allLists for e in l ]
def option5():
result = allLists[0].copy()
for lst in allLists[1:]:
result.extend(lst)
return result
from collections import deque
def option6():
result = deque()
for lst in allLists:
result.extend(lst)
return list(result)
from timeit import timeit
count = 1000
t = timeit(lambda:option1(), number = count)
print(f"option1 += loop {t:.5f}",option1()[:15])
t = timeit(lambda:option2(), number = count)
print(f"option2 itertools.chain {t:.5f}",option2()[:15])
t = timeit(lambda:option3(), number = count)
print(f"option3 functools.reduce {t:.5f}",option3()[:15])
t = timeit(lambda:option4(), number = count)
print(f"option4 comprehension {t:.5f}",option4()[:15])
t = timeit(lambda:option5(), number = count)
print(f"option5 extend loop {t:.5f}",option5()[:15])
t = timeit(lambda:option6(), number = count)
print(f"option6 deque {t:.5f}",option6()[:15])

Related

Sort the item from two lists in one list while keeping the sequence in the original lists

I have two unsorted lists as followed:
A = [1, 3, 1.75]
B = [0, 1.5, 2, 4]
I want to make a list that includes the numbers in A and B in a sorted manner (e.g. ascending). However, I want to keep the sequence from each list as well. The suitable output would look like something below:
AB = [0, 1, 1.5, 2, 3, 1.75, 4]
Do you have any ideas/hints on how to do this? The original problem includes 150 lists that need to be merged into one list like above. Thank you for your ideas beforehand!

This looks like a "merge" problem to me:
def merge(lists):
iters = [iter(s) for s in lists]
heads = [next(s) for s in iters]
res = []
inf = float('inf')
while True:
v, n = min((v, n) for n, v in enumerate(heads))
if v == inf:
return res
res.append(v)
try:
heads[n] = next(iters[n])
except StopIteration:
heads[n] = inf
lists = [
[1,2,3,8],
[1,7,4],
[6,9,1,2,3],
]
print(merge(lists))
## [1, 1, 2, 3, 6, 7, 4, 8, 9, 1, 2, 3]

Python adding a list to a slice of another list

Here's basic problem:
>>> listb = [ 1, 2, 3, 4, 5, 6, 7 ]
>>> slicea = slice(2,5)
>>> listb[slicea]
[3, 4, 5]
>>> lista = listb[slicea]
>>> lista
[3, 4, 5]
>>> listb[slicea] += lista
>>> listb
[1, 2, 3, 4, 5, 3, 4, 5, 6, 7]
listb should be
[1, 2, 6, 8, 10, 6, 7]
But 3, 4, 5 was inserted after 3, 4, 5 not added to it.
tl;dr
I have this code that's not working:
self.lib_tree.item(song)['values'][select_values] = adj_list
self.lib_tree.item(album)['values'][select_values] += adj_list
self.lib_tree.item(artist)['values'][select_values] += adj_list
The full code is this:
def toggle_select(self, song, album, artist):
# 'values' 0=Access, 1=Size, 2=Selected Size, 3=StatTime, 4=StatSize,
# 5=Count, 6=Seconds, 7=SelSize, 8=SelCount, 9=SelSeconds
# Set slice to StatSize, Count, Seconds
total_values = slice(4, 7) # start at index, stop before index
select_values = slice(7, 10) # start at index, stop before index
tags = self.lib_tree.item(song)['tags']
if "songsel" in tags:
# We will toggle off and subtract from selected parent totals
tags.remove("songsel")
self.lib_tree.item(song, tags=(tags))
# Get StatSize, Count and Seconds
adj_list = [element * -1 for element in \
self.lib_tree.item(song)['values'][total_values]]
else:
tags.append("songsel")
self.lib_tree.item(song, tags=(tags))
# Get StatSize, Count and Seconds
adj_list = self.lib_tree.item(song)['values'][total_values] # 1 past
self.lib_tree.item(song)['values'][select_values] = adj_list
self.lib_tree.item(album)['values'][select_values] += adj_list
self.lib_tree.item(artist)['values'][select_values] += adj_list
if self.debug_toggle < 10:
self.debug_toggle += 1
print('artist,album,song:',self.lib_tree.item(artist, 'text'), \
self.lib_tree.item(album, 'text'), \
self.lib_tree.item(song, 'text'))
print('adj_list:',adj_list)
The adj_list has the correct values showing up in debug.
How do I add a list of values to the slice of a list?

The behavior you want is not a feature of any Python built-in type; + with built-in sequences means concatenation, not element-wise addition. But numpy arrays will do what you want, so I'd suggest looking into numpy. Simple example:
>>> import numpy as np
>>> a = np.array([2,3,4], dtype=np.int64)
>>> b = np.array([5,6,7], dtype=np.int64)
>>> a += b
>>> a
array([ 7, 9, 11])
>>> print(a)
[ 7 9 11]
>>> print(a.tolist())
[7, 9, 11]
Note that the output (both repr and str forms) looks a little different from Python lists, but you can convert back to a plain Python list if needed.

Encode array of integers into unique int

I have a fixed amount of int arrays of the form:
[a,b,c,d,e]
for example:
[2,2,1,1,2]
where a and b can be ints from 0 to 2, c and d can be 0 or 1, and e can be ints from 0 to 2.
Therefore there are: 3 * 3 * 2 * 2 * 3: 108 possible arrays of this form.
I would like to assign to each of those arrays a unique integer code from 0 to 107.
I am stuck, i thought of adding each numbers in the array, but two arrays such as:
[0,0,0,0,1] and [1,0,0,0,0]
would both add to 1.
Any suggestion?
Thank you.

You could use np.ravel_multi_index:
>>> np.ravel_multi_index([1, 2, 0, 1, 2], (3, 3, 2, 2, 3))
65
Validation:
>>> {np.ravel_multi_index(j, (3, 3, 2, 2, 3)) for j in itertools.product(*map(range, (3,3,2,2,3)))} == set(range(np.prod((3, 3, 2, 2, 3))))
True
Going back the other way:
>>> np.unravel_index(65, dims=(3, 3, 2, 2, 3))
(1, 2, 0, 1, 2)

Just another way, similar to Horner's method for polynomials:
>>> array = [1, 2, 0, 1, 2]
>>> ranges = (3, 3, 2, 2, 3)
>>> reduce(lambda i, (a, r): i * r + a, zip(array, ranges), 0)
65
Unrolled that's ((((0 * 3 + 1) * 3 + 2) * 2 + 0) * 2 + 1) * 3 + 2 = 65.

This is a little like converting digits from a varying-size number base to a standard integer. In base-10, you could have five digits, each from 0 to 9, and then you would convert them to a single integer via i = a*10000 + b*1000 + c*100 + d*10 + e*1.
Equivalently, for the decimal conversion, you could write i = np.dot([a, b, c, d, e], bases), where bases = [10*10*10*10, 10*10*10, 10*10, 10, 1].
You can do the same thing with your bases, except that your positions introduce multipliers of [3, 3, 2, 2, 3] instead of [10, 10, 10, 10, 10]. So you could set bases = [3*2*2*3, 2*2*3, 2*3, 3, 1] (=[36, 12, 6, 3, 1]) and then use i = np.dot([a, b, c, d, e], bases). Note that this will always give answers in the range of 0 to 107 if a, b, c, d, and e fall in the ranges you specified.
To convert i back into a list of digits, you could use something like this:
digits = []
remainder = i
for base in bases:
digit, remainder = divmod(remainder, base)
digits.append(digit)
On the other hand, to keep your life simple, you are probably better off using Paul Panzer's answer, which pretty much does the same thing. (I never thought of an n-digit number as the coordinates of a cell in an n-dimensional grid before, but it turns out they're mathematically equivalent. And np.ravel is an easy way to assign a serial number to each cell.)

This data is small enough that you may simply enumerate them:
>>> L = [[a,b,c,d,e] for a in range(3) for b in range(3) for c in range(2) for d in range(2) for e in range(3)]
>>> L[0]
[0, 0, 0, 0, 0]
>>> L[107]
[2, 2, 1, 1, 2]
If you need to go the other way (from the array to the integer) make a lookup dict for it so that you will get O(1) instead of O(n):
>>> lookup = {tuple(x): i for i, x in enumerate(L)}
>>> lookup[1,1,1,1,1]
58

getting dot-product of your vectors as following:
In [210]: a1
Out[210]: array([2, 2, 1, 1, 2])
In [211]: a2
Out[211]: array([1, 0, 1, 1, 0])
In [212]: a1.dot(np.power(10, np.arange(5,0,-1)))
Out[212]: 221120
In [213]: a2.dot(np.power(10, np.arange(5,0,-1)))
Out[213]: 101100
should produce 108 unique numbers - use their indices...

If the array lenght is not very huge, you can calculate out the weight first, then use simple math formula to get the ID.
The code will be like:
#Test Case
test1 = [2, 2, 1, 1, 2]
test2 = [0, 2, 1, 1, 2]
test3 = [0, 0, 0, 0, 2]
def getUniqueID(target):
#calculate out the weights first;
#When Index=0; Weight[0]=1;
#When Index>0; Weight[Index] = Weight[Index-1]*(The count of Possible Values for Previous Index);
weight = [1, 3, 9, 18, 36]
return target[0]*weight[0] + target[1]*weight[1] + target[2]*weight[2] + target[3]*weight[3] + target[4]*weight[4]
print 'Test Case 1:', getUniqueID(test1)
print 'Test Case 2:', getUniqueID(test2)
print 'Test Case 3:', getUniqueID(test3)
#Output
#Test Case 1: 107
#Test Case 2: 105
#Test Case 3: 72
#[Finished in 0.335s]

Iterate through each value of list in order, starting at random value

Given the following code:
length = 10
numbers = [x for x in range(length)]
start_index = randint(0,length-1)
# now output each value in order from start to start-1 (or end)
# ex. if start = 3 --> output = 3,4,5,6,7,8,9,0,1,2
# ex if start = 9 ---> output = 9,0,1,2,3,4,5,6,7,8
What is the best / simplest / most pythonic / coolest way to iterate over the list and print each value sequentially, beginning at start and wrapping to start-1 or the end if the random value were 0.
Ex. start = 3 then output = 3,4,5,6,7,8,9,1,2
I can think of some ugly ways (try, except IndexError for example) but looking for something better. Thanks!
EDIT: made it clearer that start is the index value to start at

You should use the % (modulo) operator.
length = 10
numbers = [x for x in range(length)]
start = randint(0, length)
for i in range(length):
n = numbers[(i + start) % length]
print(n)

>>> start = randint(0, len(numbers))
>>> start
1
You can use list slicing then iterate over that
>>> numbers[start:] + numbers[:start]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
You can also use the modulus % operator in a list comprehension
>>> [numbers[i%len(numbers)] for i in range(start, start + len(numbers))]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]

What is the best / simplest / most pythonic / coolest way ...
You can use collections.deque and its rotate function, like this
>>> from collections import deque
>>> d = deque(numbers)
>>> d.rotate(-9)
>>> d
deque([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])
>>>
>>> d = deque(numbers)
>>> d.rotate(-2)
>>> d
deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])

You can try to iterate over the list with simple conditional loops
i = start
while(True):
print i,
if i==numbers[-1]: # If it's the last number
i=numbers[0]
else:
i += 1
if i==start: # One iteration is over
break
This will print 3 4 5 6 7 8 9 0 1 2

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})

You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Various list concatenation method and their performance - python

Related

Sort the item from two lists in one list while keeping the sequence in the original lists

Python adding a list to a slice of another list

Encode array of integers into unique int

Iterate through each value of list in order, starting at random value

Python equivalent of R "split"-function

Categories

Resources