I have quite a large number of data sets to extend.
I'm wondering what would be an alternative/faster way of doing it.
I have tried both iadd and extend, both of them takes quite a while to create an output.
from timeit import timeit
raw_data = [];
raw_data2 = [];
added_data = range(100000)
# .__iadd__
def test1():
for i in range(10):
raw_data.__iadd__(added_data*i);
#extend
def test2():
for i in range(10):
raw_data2.extend(added_data*i);
print(timeit(test1,number=2));
print(timeit(test2,number=2));
I feel the list comprehension or array mapping could be an answer to my question ...
If you need your data as list, there is not much to gain - list.extend and __iadd__ are very close in performance - depending on the amounts you use one or the other is fastest:
import timeit
from itertools import repeat , chain
raw_data = []
added_data = range(100000) # verify data : uncomment: range(5)
def iadd():
raw_data = []
for i in range(10):
raw_data.__iadd__(added_data)
# print(raw_data)
def extend():
raw_data = []
for i in range(10):
raw_data.extend(added_data)
# print(raw_data)
def tricked():
raw_data = list(chain.from_iterable(repeat(added_data,10)))
# print(raw_data)
for w,c in (("__iadd__",iadd),(" extend",extend),(" tricked",tricked)):
print(w,end = " : ")
print("{:08.8f}".format(timeit.timeit(c, number = 200)))
Output:
# number = 20
__iadd__ : 0.69766775
extend : 0.69303196 # "fastest"
tricked : 0.74638002
# number = 200
__iadd__ : 6.94286992 # "fastest"
extend : 6.96098415
tricked : 7.46355973
If you do not need the things, you might be better off using a generator that chain.from_iterable(repeat(added_data,10)) without creating the list itself to reduce the amount of memory used.
Related:
Martijn Pieters♦ answer
I'm unsure if there is a better way to do this, but using numpy and ctypes, you can preallocate enough memory for the entire array, then use ctypes.memmove to copy data into raw_data - which is now a ctypes array of ctypes.c_longs.
from timeit import timeit
import ctypes
import numpy
def test_iadd():
raw_data = []
added_data = range(1000000)
for i in range(10):
raw_data.__iadd__(added_data)
def test_extend():
raw_data = []
added_data = range(1000000)
for i in range(10):
raw_data.extend(added_data)
return
def test_memmove():
added_data = numpy.arange(1000000) # numpy equivalent of range
raw_data = (ctypes.c_long * (len(added_data) * 10))() # make a ctypes array to contain elements
# the address to copy to
raw_data_addr = ctypes.addressof(raw_data)
# the length of added_data in bytes
added_data_len = len(added_data) * ctypes.sizeof(ctypes.c_long)
for i in range(10):
# copy data for one section
ctypes.memmove(raw_data_addr, added_data.ctypes.data, added_data_len)
# update address to copy to
raw_data_addr += added_data_len
tests = [test_iadd, test_extend, test_memmove]
for test in tests:
print '{} {}'.format(test.__name__, timeit(test, number=5))
This code produced the following results on my PC:
test_iadd 0.648954868317
test_extend 0.640357971191
test_memmove 0.201567173004
This appears to show that using ctypes.memmove is significantly faster.
what is the best way to have better, dynamic control on the decorators - choosing from numba.cuda.jit, numba.jit and none (pure python). [please note that a project can have 10s or 100s of functions, so this should be easy to apply to all the functions]
here is an example from numba website.
import numba as nb
import numpy as np
# global control of this --> #nb.jit or #nb.cuda.jit or none
# some functions with #nb.jit or cuda.jit with kwargs like (nopython=True, **other_kwargs)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a = np.arange(81).reshape(9,9)
sum2d(a)
You may want something more sophisticated, but a relatively simple solution is redefining jit based on settings. For example
def _noop_jit(f=None, *args, **kwargs):
""" returns function unmodified, discarding decorator args"""
if f is None:
return lambda x: x
return f
# some config flag
if settings.PURE_PYTHON_MODE:
jit = _noop_jit
else: # etc
from numba import jit
#jit(nopython=True)
def f(a):
return a + 1
I am trying to do some timing comparisons using numba.
What I don't understand in the following mwe.py is why I get different results
from __future__ import print_function
import numpy as np
from numba import autojit
import time
def timethis(method):
'''decorator for timing function calls'''
def timed(*args, **kwargs):
ts = time.time()
result = method(*args, **kwargs)
te = time.time()
print('{!r} {:f} s'.format(method.__name__, te - ts))
return result
return timed
def pairwise_pure(x):
'''sample function, compute pairwise distancee, see: jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/'''
M, N = x.shape
D = np.empty((M, M), dtype=np.float)
for i in range(M):
for j in range(M):
d = 0.
for k in range(N):
tmp = x[i, k] - x[j, k]
d += tmp * tmp
D[i, j] = np.sqrt(d)
return D
# first version
#timethis
#autojit
def pairwise_numba(args):
return pairwise_pure(args)
# second version
#timethis
def pairwise_numba_alt(args):
return autojit(pairwise_pure)(args)
x = np.random.random((1e3, 10))
pairwise_numba(x)
pairwise_numba_alt(x)
Evaluating python3 mwe.py gives this output:
'pairwise_numba' 5.971631 s
'pairwise_numba_alt' 0.191500 s
In the first version, I decorate the method using timethis to calculate the timings, and with autojit to speed up the code , whereas in the second one I decorate the function with timethis, and call autojit(...) afterwards.
Does someone have an explanation ?
Actually the documentation explicitly states that for optimization each call to other functions "inside" a decorated function should be decorated as well or it isn't optimized.
For many functions like numpy functions that isn't necessary since they are highly optimized but for native python functions it is.
Say I have the following example, in python:
import numpy as np, PROGRAMS as prg
testlist = []
x = 0
n=0
y=[1,2,3,4,5]
x_fn = np.array=([prg.test1(x),prg.test2(x),prg.test3(x)])
for i in range(0,len(x_fn)):
for j in range(0, len(y)):
x = y[j]*2
z=x_fn[i]
testlist.append(z)
j = j+1
i = i+1
print testlist
#####PROGRAMS
def test1(x):
x=x**2
return x
def test2(x):
x=x**3
return x
def test3(x):
x=x+10
return x
If x isn't defined before x_fn then an error occurs but if I define it as zero then that is what is used in the calculations. I basically want this code to produce a list with the the defined value of x in the 2nd loop :
x = y[j]*2
for all values of y. I know there would be a way around this mathematically - but I would like to solve it by running the same function and not changing any of the values of y or any of the functions in PROGRAMS.
Basically, is it a good idea to put these functions in a array and run through it element by element or is there a better way to do it?
Thanks in advance for your replies,
Sven D.
Could this be what you want ?
def test1(x):
x=x**2
return x
def test2(x):
x=x**3
return x
def test3(x):
x=x+10
return x
testlist = []
n=0
y_vals=[1,2,3,4,5]
x_fn = [test1, test2, test3]
for fun in x_fn:
for y in y_vals:
x = y*2
z=fun(x)
testlist.append(z)
print testlist
Functions are objects that can be stored in containers and recalled for use later just like any other object in Python.
You don't even need to use numpy arrays. Just use a list (functions) and put the test functions in it. Note, that I have removed the argument. The elements of the functions array are references to the functions, so you can use them in your loop.
import PROGRAMS as prg
testlist = []
y=[1,2,3,4,5]
functions = [prg.test1, prg.test2, prg.test3]
for func in functions:
for j in y:
x = j*2
z = func(x)
testlist.append(z)
print testlist
#####PROGRAMS
def test1(x):
x=x**2
return x
def test2(x):
x=x**3
return x
def test3(x):
x=x+10
return x
I revrite my neural net from pure python to numpy, but now it is working even slower. So I tried this two functions:
def d():
a = [1,2,3,4,5]
b = [10,20,30,40,50]
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
c = a*b
return c
timeit d = 1.77135205057
timeit e = 17.2464673758
Numpy is 10times slower. Why is it so and how to use numpy properly?
I would assume that the discrepancy is because you're constructing lists and arrays in e whereas you're only constructing lists in d. Consider:
import numpy as np
def d():
a = [1,2,3,4,5]
b = [10,20,30,40,50]
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,50])
c = a*b
return c
#Warning: Functions with mutable default arguments are below.
# This code is only for testing and would be bad practice in production!
def f(a=[1,2,3,4,5],b=[10,20,30,40,50]):
c = [i*j for i,j in zip(a,b)]
return c
def g(a=np.array([1,2,3,4,5]),b=np.array([10,20,30,40,50])):
c = a*b
return c
import timeit
print timeit.timeit('d()','from __main__ import d')
print timeit.timeit('e()','from __main__ import e')
print timeit.timeit('f()','from __main__ import f')
print timeit.timeit('g()','from __main__ import g')
Here the functions f and g avoid recreating the lists/arrays each time around and we get very similar performance:
1.53083586693
15.8963699341
1.33564996719
1.69556999207
Note that list-comp + zip still wins. However, if we make the arrays sufficiently big, numpy wins hands down:
t1 = [1,2,3,4,5] * 100
t2 = [10,20,30,40,50] * 100
t3 = np.array(t1)
t4 = np.array(t2)
print timeit.timeit('f(t1,t2)','from __main__ import f,t1,t2',number=10000)
print timeit.timeit('g(t3,t4)','from __main__ import g,t3,t4',number=10000)
My results are:
0.602419137955
0.0263929367065
import time , numpy
def d():
a = range(100000)
b =range(0,1000000,10)
c = [i*j for i,j in zip(a,b)]
return c
def e():
a = numpy.array(range(100000))
b =numpy.array(range(0,1000000,10))
c = a*b
return c
#python ['0.04s', '0.04s', '0.04s']
#numpy ['0.02s', '0.02s', '0.02s']
try it with bigger arrays... even with the overhead of creating arrays numpy is much faster
Numpy data structures is slower on adding/constructing
Here some tests:
from timeit import Timer
setup1 = '''import numpy as np
a = np.array([])'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)
setup2 = 'l = list()'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)
print('appending to empty list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))
setup1 = '''import numpy as np
a = np.array(range(999999))'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)
setup2 = 'l = [x for x in xrange(999999)]'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)
print('appending to large list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))
Results:
appending to empty list:
[0.008171333983972538, 0.0076482562944814175, 0.007862921943675175]
[0.00015624398517267296, 0.0001191077336243837, 0.000118654852507942]
appending to large list:
[2.8521017080411304, 2.8518707386717446, 2.8022625940577477]
[0.0001643958452675065, 0.00017888804099541744, 0.00016711313196715594]
I don't think numpy is slow because it must take into account the time required to write and debug. The longer the program, the more difficult it is to find problems or add new features (programmer time).
Therefore, to use a higher level language allows, at equal intelligence time and skill, to create a program complex and potentially more efficient.
Anyway, some interesting tools to optimize are:
-Psyco is a JIT (just in time, "real time"), which optimizes at runtime the code.
-Numexpr, parallelization is a good way to speed up the execution of a program, provided that is sufficiently separable.
-weave is a module within NumPy to communicate Python and C. One of its functions is to blitz, which takes a line of Python, the transparently translates C, and each time the call is executed optimized version. In making this first conversion requires around a second, but higher speeds generally get all of the above. It's not as Numexpr or Psyco bytecode, or interface C as NumPy, but your own function written directly in C and fully compiled and optimized.