Alternatives/Faster ways to list.extend in python? - python

I have quite a large number of data sets to extend.
I'm wondering what would be an alternative/faster way of doing it.
I have tried both iadd and extend, both of them takes quite a while to create an output.
from timeit import timeit
raw_data = [];
raw_data2 = [];
added_data = range(100000)
# .__iadd__
def test1():
for i in range(10):
raw_data.__iadd__(added_data*i);
#extend
def test2():
for i in range(10):
raw_data2.extend(added_data*i);
print(timeit(test1,number=2));
print(timeit(test2,number=2));
I feel the list comprehension or array mapping could be an answer to my question ...

If you need your data as list, there is not much to gain - list.extend and __iadd__ are very close in performance - depending on the amounts you use one or the other is fastest:
import timeit
from itertools import repeat , chain
raw_data = []
added_data = range(100000) # verify data : uncomment: range(5)
def iadd():
raw_data = []
for i in range(10):
raw_data.__iadd__(added_data)
# print(raw_data)
def extend():
raw_data = []
for i in range(10):
raw_data.extend(added_data)
# print(raw_data)
def tricked():
raw_data = list(chain.from_iterable(repeat(added_data,10)))
# print(raw_data)
for w,c in (("__iadd__",iadd),(" extend",extend),(" tricked",tricked)):
print(w,end = " : ")
print("{:08.8f}".format(timeit.timeit(c, number = 200)))
Output:
# number = 20
__iadd__ : 0.69766775
extend : 0.69303196 # "fastest"
tricked : 0.74638002
# number = 200
__iadd__ : 6.94286992 # "fastest"
extend : 6.96098415
tricked : 7.46355973
If you do not need the things, you might be better off using a generator that chain.from_iterable(repeat(added_data,10)) without creating the list itself to reduce the amount of memory used.
Related:
Martijn Pieters♦ answer

I'm unsure if there is a better way to do this, but using numpy and ctypes, you can preallocate enough memory for the entire array, then use ctypes.memmove to copy data into raw_data - which is now a ctypes array of ctypes.c_longs.
from timeit import timeit
import ctypes
import numpy
def test_iadd():
raw_data = []
added_data = range(1000000)
for i in range(10):
raw_data.__iadd__(added_data)
def test_extend():
raw_data = []
added_data = range(1000000)
for i in range(10):
raw_data.extend(added_data)
return
def test_memmove():
added_data = numpy.arange(1000000) # numpy equivalent of range
raw_data = (ctypes.c_long * (len(added_data) * 10))() # make a ctypes array to contain elements
# the address to copy to
raw_data_addr = ctypes.addressof(raw_data)
# the length of added_data in bytes
added_data_len = len(added_data) * ctypes.sizeof(ctypes.c_long)
for i in range(10):
# copy data for one section
ctypes.memmove(raw_data_addr, added_data.ctypes.data, added_data_len)
# update address to copy to
raw_data_addr += added_data_len
tests = [test_iadd, test_extend, test_memmove]
for test in tests:
print '{} {}'.format(test.__name__, timeit(test, number=5))
This code produced the following results on my PC:
test_iadd 0.648954868317
test_extend 0.640357971191
test_memmove 0.201567173004
This appears to show that using ctypes.memmove is significantly faster.

Related

How to set numba signature with nested lists?

I'm trying to return an nested list, however running into some conversion error.
Below is small piece of code for reproduction of error.
from numba import njit, prange
#njit("ListType(ListType(ListType(int32)))(int32, int32)", fastmath = True, parallel = True, cache = True)
def test(x, y):
a = []
for i in prange(10):
b = []
for j in range(4):
c = []
for k in range(5):
c.append(k)
b.append(c)
a.append(b)
return a
Error
I try to avoid using empty lists with numba, mainly because an empty list cannot be typed. Check out nb.typeof([])
I am not sure whether your output can be preallocated but you could consider arrays. There would also be massive performance benefits. Here is an attempt:
from numba import njit, prange, int32
import numpy as np
#njit(int32[:,:,:](int32, int32), fastmath = True, parallel = True, cache = True)
def test(x, y):
out = np.zeros((10,x,y), dtype=int32)
for i in prange(10):
for j in range(x):
for k in range(y):
out[i][j][k] = k
return out
That said, you might indeed need lists for your application, in which case this answer might not be of much use.
This worked for me.
from numba import njit, prange
from numba.typed import List
#njit(fastmath = True, parallel = True, cache = True)
def test(x, y):
a = List()
for i in prange(10):
b = List()
for j in range(4):
c = List()
for k in range(5):
c.append(k)
b.append(c)
a.append(b)
return a
Your signature is fine, but you need to match the type of List that you create inside the function. So a numba.typed.List instead of [].
from numba import njit, prange
from numba.typed import List
from numba.types import int32
#njit("ListType(ListType(ListType(int32)))(int32, int32)", fastmath=True, parallel=True, cache=True)
def test(x, y):
a = List.empty_list(List.empty_list(List.empty_list(int32)))
for i in prange(10):
b = List.empty_list(List.empty_list(int32))
for j in range(4):
c = List.empty_list(int32)
for k in range(5):
c.append(int32(k))
b.append(c)
a.append(b)
return a
I don't think you should expect much from appending to a List in parallel in this case.

Concurrent.futures not parallelizing loop iterations

I am trying to use concurrent.futures to process a function with multiple threads to efficiently speed up the code.
I have read their documentation and this guide but believe I may not be doing this correctly. This MRE should allow us to test a number of different string lengths and list sizes to compare performance:
import pandas as pd, tqdm, string, random
from thefuzz import fuzz, process
from concurrent.futures import ThreadPoolExecutor
def generate_string(items=10, lengths=5):
return [''.join(random.choice(string.ascii_letters) for i in range (lengths))] * items
def matching(a, b):
matches = {}
scorers = {'token_sort_ratio': fuzz.token_sort_ratio, 'token_set_ratio': fuzz.token_set_ratio, 'partial_token_sort_ratio': fuzz.partial_token_sort_ratio,
'Quick': fuzz.QRatio, 'Unicode Quick': fuzz.UQRatio, 'Weighted': fuzz.WRatio, 'Unweighted': fuzz.UWRatio}
for x in tqdm.tqdm(a):
best = 0
for _, scorer in scorers.items():
res = process.extractOne(x, b, scorer=scorer)
if res[1] > best:
best = res[1]
matches[x] = res
else:
continue
return matches
list_a = generate_string(100, 10)
list_b = generate_string(10, 5)
with ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(matching, list_a, list_b)
This code runs with no error; how can I use multiple workers to execute these loops in parallel so that the code will run faster?
Thanks to a hint from #Anentropic, I was able to use the following change with multiprocessing
if __name__ == '__main__':
list_a = generate_string(500, 10)
list_b = generate_string(500, 10)
pool = Pool(os.cpu_count()-2)
res = pool.map(matching, zip(list_a, list_b))
norm_res = matching([list_a, list_b])

Fastest data structure to return a sequence of items in order

I'm working on a leet problem and I'm just doing this for the fun of it. I think I can go as far as cracking all of the test inputs of the problem. So say, I know ahead of time all of the inputs, in order. What is the fastest structure to return the solutions.
I tried using a dict to map all inputs with the supposed solutions.
class DictSolution():
DATA = {x:x for x in range(1000000)}
def compute(self, x):
return self.DATA[x]
Then, I thought, since I know in what order the inputs will be tested, I don't need to "look it up". So I tried using a set and ignoring all the inputs.
class SetSolution():
DATA = {i for i in range(1000000)}
def compute(self, x):
return self.DATA.pop()
To my surprise, it was slightly slower than the dict, 1-2% slower everytime. Btw, here's how I time them.
def test_dict():
sol = DictSolution()
for i in range(1000000):
sol.compute(i
ds = timeit.timeit(test_dict, number=1)
ss = timeit.timeit(test_set, number=1)
print ("Dict Solution:", ds)
print ("Set Solution:", ss)
>> Dict Solution: 0.11734077199999998
>> Set Solution: 0.11939082499999998
Questions:
Why is the set slower?
Logically speaking, returning something in order should be faster than looking that thing up in a table. So I don't believe the dict way is the fastest already. What can I do to achieve better time?
I believe the suggestion from #schwobaseggl is correct. From here, the complexity of accessing an element is O(1) for both dict and list, this was somewhat replicated in this question, in fact in the previous setting list was sligthly faster than dict. I replicated your benchmark, and added other data structures, in full:
List
Dictionary
Set (using pop)
Deque (from collections)
Tuple
The code:
import timeit
from collections import deque
class DictSolution:
DATA = {x: x for x in range(1000000)}
def compute(self, x):
return self.DATA[x]
class SetSolution:
DATA = {i for i in range(1000000)}
def compute(self, x):
return self.DATA.pop()
class DequeSolution:
DATA = deque(i for i in range(1000000))
def compute(self, x):
return self.DATA.popleft()
class ListSolution:
DATA = [i for i in range(1000000)]
def compute(self, x):
return self.DATA[x]
class TupleSolution:
DATA = tuple(i for i in range(1000000))
def compute(self, x):
return self.DATA[x]
def test_dict():
sol = DictSolution()
for i in range(1000000):
sol.compute(i)
def test_set():
sol = SetSolution()
for i in range(1000000):
sol.compute(i)
def test_deque():
sol = DequeSolution()
for i in range(1000000):
sol.compute(i)
def test_list():
sol = ListSolution()
for i in range(1000000):
sol.compute(i)
def test_tuple():
sol = TupleSolution()
for i in range(1000000):
sol.compute(i)
def test_pop_list():
sol = PopListSolution()
for i in range(1000000):
sol.compute(i)
des = timeit.timeit(test_deque, number=1)
ss = timeit.timeit(test_set, number=1)
ds = timeit.timeit(test_dict, number=1)
ls = timeit.timeit(test_list, number=1)
ts = timeit.timeit(test_tuple, number=1)
times = [("Dict Solution:", ds), ("Set Solution:", ss), ("Deque Solution:", des), ("List Solution:", ls), ("Tuple Solution:", ts)]
for label, time in sorted(times, key=lambda e: e[1]):
print(label, time)
Output
Tuple Solution: 0.1597294129896909
List Solution: 0.16653884798870422
Dict Solution: 0.17414769899914972
Set Solution: 0.190879073983524
Deque Solution: 0.1914772919844836
I ran the script several times and the results were similar with the tuple solution and list solution alternating the lead. Note that both SetSolution and DequeSolution were the slowest. So to answer your questions:
Both set and deque are slower because you are deleting an element from the list, in the other structures you are only accessing the elements.
This was partially answered in previous answer, pop does not only returns an element from the data structure it also deletes the element from it. So I will expect that updating a data structure will be slower than accessing only one of the elements of it.
Notes
Although pop works with set for the test cases, in general this behaviour is not expected, for instance:
test = {'e' + str(i) for i in range(10)}
while test:
print(test.pop())
Output (set pop)
e5
e8
e6
e0
e1
e3
e7
e4
e9
e2
More on this topic can be found here.
I also benchmarked a solution using list.pop(0) albeit using a smaller range: 100000, and fewer candidates (list, tuple and dict). The results were the following:
('List Solution:', 0.018702030181884766)
('Tuple Solution:', 0.021403074264526367)
('Dict Solution:', 0.02230381965637207)
('List Pop Solution', 1.8658080101013184)
The benchmark was ran on the following setup:
Intel(R) Core(TM) i7-4500U CPU # 1.80GHz
16GB
Ubuntu 16.04
Python 3.5.2
The dict is the fastest lookup data structure because it is implemented using a hash table: it nearly takes constant time to look up a key in a hash table. Check out the link below for more info:
This pdf from MIT explains the subject

Python key error when using timeit to test dictionary key deletion time

I have become increasingly tired and perplexed trying to get this code to work. This is an algorithm analysis assignment from the "Problem Solving with Data Structures and Algorithms" web textbook. It asks to compare the time it takes to delete a list element and a dictionary element. The test for the list deletion time works fine, but whether I try to delete a dictionary element it gives me a key error. Can anyone explain why this is so?
import timeit
import pylab
x_list = []
delList_list = []
delDictionary_list = []
delDictionary = timeit.Timer("del x[0]",
"from __main__ import x")
delList = timeit.Timer("del x[100]",
"from __main__ import x")
for i in range(10000,100001,20000):
x_list.append(i)
x = list(range(i))
delListTime = delList.timeit(number=1000)
delList_list.append(delListTime)
x = {j:None for j in range(i)}
delDictTime = delDictionary.timeit(number = 1000)
delDictionary_list.append(delDictTime)
pylab.xlabel('Size')
pylab.ylabel('Time to complete contains operation')
pylab.plot(x_list, delList_list, 'c')
pylab.plot(x_list, delDictionary_list, 'm')
pylab.show()
timeit repeats the code under test, but your dictionary is not part of that code. As such, after the first delete, you'll get a KeyError.
You'd have to generate enough copies of the dictionary up-front, and in the test code pick a next dictionary. Do the same for the list objects to keep things on an even keel:
delDictionary = timeit.Timer("del next(xiter)[0]",
"from __main__ import xiter")
delList = timeit.Timer("del next(xiter)[100]",
"from __main__ import xiter")
# ... and in the loop
x = [list(range(i)) for _ in range(1000)] # 1000 identical lists
xiter = iter(x)
delListTime = delList.timeit(number=1000)
delList_list.append(delListTime)
x = [dict.fromkeys(range(i)) for _ in range(1000)] # 1000 identical dictionaries
xiter = iter(x)
delDictTime = delDictionary.timeit(number = 1000)
delDictionary_list.append(delDictTime)
So each test is given a fresh list or dictionary object, making the comparison fair.
Note that I replaced {j:None for j in range(i)} with the far faster dict.fromkeys(range(i)); the latter loops in C code, the default value is None (but watch out when using dict.fromkeys() with a mutable object, no copies are created).

Why is numpy slower than python? How to make code perform better

I revrite my neural net from pure python to numpy, but now it is working even slower. So I tried this two functions:
def d():
a = [1,2,3,4,5]
b = [10,20,30,40,50]
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
c = a*b
return c
timeit d = 1.77135205057
timeit e = 17.2464673758
Numpy is 10times slower. Why is it so and how to use numpy properly?
I would assume that the discrepancy is because you're constructing lists and arrays in e whereas you're only constructing lists in d. Consider:
import numpy as np
def d():
a = [1,2,3,4,5]
b = [10,20,30,40,50]
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
c = a*b
return c
#Warning: Functions with mutable default arguments are below.
# This code is only for testing and would be bad practice in production!
def f(a=[1,2,3,4,5],b=[10,20,30,40,50]):
c = [i*j for i,j in zip(a,b)]
return c
def g(a=np.array([1,2,3,4,5]),b=np.array([10,20,30,40,50])):
c = a*b
return c
import timeit
print timeit.timeit('d()','from __main__ import d')
print timeit.timeit('e()','from __main__ import e')
print timeit.timeit('f()','from __main__ import f')
print timeit.timeit('g()','from __main__ import g')
Here the functions f and g avoid recreating the lists/arrays each time around and we get very similar performance:
1.53083586693
15.8963699341
1.33564996719
1.69556999207
Note that list-comp + zip still wins. However, if we make the arrays sufficiently big, numpy wins hands down:
t1 = [1,2,3,4,5] * 100
t2 = [10,20,30,40,50] * 100
t3 = np.array(t1)
t4 = np.array(t2)
print timeit.timeit('f(t1,t2)','from __main__ import f,t1,t2',number=10000)
print timeit.timeit('g(t3,t4)','from __main__ import g,t3,t4',number=10000)
My results are:
0.602419137955
0.0263929367065
import time , numpy
def d():
a = range(100000)
b =range(0,1000000,10)
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = numpy.array(range(100000))
b =numpy.array(range(0,1000000,10))
c = a*b
return c
#python ['0.04s', '0.04s', '0.04s']
#numpy ['0.02s', '0.02s', '0.02s']
try it with bigger arrays... even with the overhead of creating arrays numpy is much faster
Numpy data structures is slower on adding/constructing
Here some tests:
from timeit import Timer
setup1 = '''import numpy as np
a = np.array([])'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)
setup2 = 'l = list()'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)
print('appending to empty list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))
setup1 = '''import numpy as np
a = np.array(range(999999))'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)
setup2 = 'l = [x for x in xrange(999999)]'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)
print('appending to large list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))
Results:
appending to empty list:
[0.008171333983972538, 0.0076482562944814175, 0.007862921943675175]
[0.00015624398517267296, 0.0001191077336243837, 0.000118654852507942]
appending to large list:
[2.8521017080411304, 2.8518707386717446, 2.8022625940577477]
[0.0001643958452675065, 0.00017888804099541744, 0.00016711313196715594]
I don't think numpy is slow because it must take into account the time required to write and debug. The longer the program, the more difficult it is to find problems or add new features (programmer time).
Therefore, to use a higher level language allows, at equal intelligence time and skill, to create a program complex and potentially more efficient.
Anyway, some interesting tools to optimize are:
-Psyco is a JIT (just in time, "real time"), which optimizes at runtime the code.
-Numexpr, parallelization is a good way to speed up the execution of a program, provided that is sufficiently separable.
-weave is a module within NumPy to communicate Python and C. One of its functions is to blitz, which takes a line of Python, the transparently translates C, and each time the call is executed optimized version. In making this first conversion requires around a second, but higher speeds generally get all of the above. It's not as Numexpr or Psyco bytecode, or interface C as NumPy, but your own function written directly in C and fully compiled and optimized.

Categories