Python: vectorizing a function call which uses an array of objects - python

I have an array of objects. I also have a function that requires information from 2 of the objects at a time. I would like to vectorize the call to the function so that it calculates all calls at once, rather than using a loop to go through the necessary pair of objects.
I have gotten this to work if I instead create an array with the necessary data. However this partially defeats the purpose of using objects.
Here is the code. It currently works using the array method and only one line needs to be commented/uncommented in the function to switch to the "object" mode that does not work, but I dearly wish would.
The error I get is: TypeError: only integer arrays with one element can be converted to an index
import numpy as np
import time as time
class ExampleObject():
def __init__(self, r):
self.r = r
def ExampleFunction(x):
""" WHAT I REALLY WANT """
# answer = exampleList[x].r - exampleList[indexArray].r
"""WHAT I AM STUCK WITH """
answer = coords[x] - exampleList[indexArray].r
return answer
indexArray = 5 #arbitrary choice of array index
sizeArray = 1000
exampleList = []
for i in range(sizeArray):
r = np.random.rand()
exampleList.append( ExampleObject( r ) )
index_list = np.arange(0,sizeArray,1)
index_list = np.delete(index_list,indexArray)
coords = np.array([h.r for h in exampleList])
answerArray = ExampleFunction(index_list)
The issue is that when I pass the function an array of integers, it doesn't return an array of answers (the vectorization I want) when I use the array (actually, list) of objects. It does work if I use an array (with no objects, just data in each element). But as I have said, this defeats in my mind, the purpose of storing information on objects to begin with. Do I really need to ALSO store the same information in arrays?

I can't comment, sorry for misusing the answer section...
If the data type of a numpy array is python object, the memory of the numpy array is not contiguous. Vectorization of the operation may not improve the performance much if any. Perhaps you might want to try numpy structured array instead.
assume the object has attributes a & b and they are double precision floating point number, then...
import numpy as np
numberOfObjects = 6
myStructuredArray = np.zeros(
(numberOfObjects,),
[("a", "f8"), ("b", "f8")],
)
you can initialize individual attributes for say object 0 like this
myStructuredArray["a"][0] = 1.0
or you can initialize individual attributes for all objects like this
myStructuredArray["a"] = [1,2,3,4,5,6]
print(myStructuredArray)
[(1., 0.) (2., 0.) (3., 0.) (4., 0.) (5., 0.) (6., 0.)]

numpy.ufunc when given an object dtype array, iterate through the array, and try to apply a cooresponding method to each element.
For example np.abs tries to apply the __abs__ method. Lets add such a method to your class:
In [31]: class ExampleObject():
...:
...: def __init__(self, r):
...: self.r = r
...: def __abs__(self):
...: return self.r
...:
Now create your arrays:
In [32]: indexArray = 5 #arbitrary choice of array index
...: sizeArray = 10
...:
...: exampleList = []
...: for i in range(sizeArray):
...: r = np.random.rand()
...: exampleList.append( ExampleObject( r ) )
...:
...: index_list = np.arange(0,sizeArray,1)
...: index_list = np.delete(index_list,indexArray)
...:
...: coords = np.array([h.r for h in exampleList])
and make an object dtype array from the list:
In [33]: exampleArr = np.array(exampleList)
In [34]: exampleArr
Out[34]:
array([<__main__.ExampleObject object at 0x7fbb541eb9b0>,
<__main__.ExampleObject object at 0x7fbb541eba90>,
<__main__.ExampleObject object at 0x7fbb541eb3c8>,
<__main__.ExampleObject object at 0x7fbb541eb978>,
<__main__.ExampleObject object at 0x7fbb541eb208>,
<__main__.ExampleObject object at 0x7fbb541eb128>,
<__main__.ExampleObject object at 0x7fbb541eb198>,
<__main__.ExampleObject object at 0x7fbb541eb358>,
<__main__.ExampleObject object at 0x7fbb541eb4e0>,
<__main__.ExampleObject object at 0x7fbb541eb048>], dtype=object)
Now we can get an array of the r values by calling the np.abs function:
In [35]: np.abs(exampleArr)
Out[35]:
array([0.28411876298913485, 0.5807617042932764, 0.30566195995294954,
0.39564156171554554, 0.28951905026871105, 0.5500945908978057,
0.40908712567465855, 0.6469497088949425, 0.7480045751535003,
0.710425181488751], dtype=object)
It also works with indexed elements of the array:
In [36]: np.abs(exampleArr[:3])
Out[36]:
array([0.28411876298913485, 0.5807617042932764, 0.30566195995294954],
dtype=object)
This is convenient, but I can't promise speed. In other tests I found that iteration over object dtypes is faster than iteration (in Python) over numeric array elements, but slower than list iteration.
In [37]: timeit np.abs(exampleArr)
3.61 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [38]: timeit [h.r for h in exampleList]
985 ns ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [39]: timeit np.array([h.r for h in exampleList])
3.55 µs ± 88.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Related

Apply slicing, conditionals to Sparse Arrays with Pallalization in Python

Apply slicing, conditionals to Sparse Arrays with Pallalization
I want to do something like dynamic programming on sparse array.
could you check the following example function,which I would like to implement for Sparse Array
(the first example is for numpy.array)
First,importing modules
from numba import jit
import numpy as np
from scipy import sparse as sp
from numba import prange
then the first example
#jit(parallel=True, nopython=True)
def mytest_csc(inptmat):
something = np.zeros(inptmat.shape[1])
for i in prange(inptmat.shape[1]):
target=0
partmat = inptmat[:, i]
for j in range(len(partmat)):
counter=0
if partmat[j] > 0:
new_val = partmat[j] / (partmat[j] + something[j])
target = (something[j] + new_val) / (counter + 1)
counter+=1
something[i] = target
return something
In the above function,
slicing/indexing sparse array
add and mulitiplication
nested for-loop
with Parallelization by Numba's prange
were done.
here is my question,how can I implement this for Sparse Array like scipy.sparse.csc_matrix?
the following is what I have tried.
This function can accept np.array or scipy.sparse.csc_matrix as the input,but it cannot be parallelized...
def mytest_csc2(inptmat):
something = np.zeros(inptmat.shape[1])
for i in prange(inptmat.shape[1]):
target=0
partmat = inptmat[:, i]
for j in range(len(partmat)):
counter=0
if partmat[j] > 0:
new_val = partmat[j] / (partmat[j] + something[j])
target = (something[j] + new_val) / (counter + 1)
counter+=1
something[i] = target
return something
The parallalization is must.
here is the speeds of the above functions.
in the example I made 100100 matrix,but in fact I need to process the significant big matrix like 100000100000. so I can't avoid Sparse Array...
inptmat=np.zeros((100,100)) #test input matrix,normal numpy array
%%timeit
mytest_csc(inptmat)
16.1 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
inptmat=sp.csc_matrix(inptmat) #test input matrix,scipy.sparse.csc_matrix
%%timeit
mytest_csc2(inptmat)
1.39 s ± 70.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I need to optimize the test function2 so that it can work fast as possible as the example with Numba.

How to multiply each digit of an array in the same index in Python?

I have to make this program that multiplies an like this:
The first number of the first list of the first array with the first number of the first list of the second array. For example:
Input
array1 = [[1,2,3], [3,2,1]]
array2 = [[4,2,5], [5,6,7]]
So my output must be:
result = [[4,4,15],[15,12,7]]
So far my code is the following:
def multiplyArrays(array1,array2):
if verifySameSize(array1,array2):
for i in array1:
for j in i:
digitA1 = j
for x in array2:
for a in x:
digitA2 = a
mult = digitA1 * digitA2
return mult
return 'Arrays must be the same size'
It's safe to say it's not working since the result I'm getting for the example I gave is 7 , not even an array, so, what am I doing wrong?
if you want a simple solution, use numpy:
import numpy as np
array1 = np.array([[1,2,3], [3,2,1]])
array2 = np.array([[4,2,5], [5,6,7]])
result = array1 * array2
if you want a general solution for your own understanding, then it becomes a bit harder: how in-depth do you want the implementation to be? there are many checks for example the same sizes, same types, number of dimensions, etc.
the problem in your code is using for each loop instead of indexing. for i in array1 runs twice, returning a list (first [1,2,3] then [3,2,1]). then you do a for each loop in each list returning a number, meaning you only get 1 number as the output which is the result of the last operation (1 * 7 = 7). You should create an empty list and append your results in a normal for loop (not for each).
so your function becomes:
def multiplyArrays(array1,array2):
result = []
for i in range(len(array1)):
result.append([])
for j in range(len(array1[i])):
result[i].append(array1[i][j]*array2[i][j])
return result
this is a bad idea though because it only works with 2D arrays and there are no checks. Avoid writing your own functions unless you absolutely need to.
You can use zip() to iterate over the lists at the same time:
array1 = [[1,2,3], [3,2,1]]
array2 = [[4,2,5], [5,6,7]]
def multiplyArrays(array1,array2):
result = []
for inner1,inner2 in zip(array1,array2):
inner = []
for item1,item2 in zip(inner1,inner2):
inner.append(item1*item2)
result.append(inner)
return result
print(multiplyArrays(array1,array2))
Output as requested.
Here are three pure-Python one-liners that yield your expected output, two of which are simply list comprehension versions of the other two answers. List comprehension equivalents are generally more efficient, but you should choose what is most readable for you.
Method 1
#quamrana's, as a list comprehension.
res = [[a * b for a, b in zip(c, d)] for c, d in zip(arr1, arr2)]
Method 2 #OM222O's, as a list comprehension.
res = [[ arr1[i][j] * arr2[i][j] for j in range(len(arr1[0])) ] for i in range(len(arr1))]
Method 3 Similar to Method 1 but makes use of operator.mul(a, b) (returns a * b) from the operator module and the built-in map(function, iterable, ...) function. The map function "[r]eturn[s] an iterator that applies function to every item of iterable, yielding the results." So given two lists a (from array1) and b (from array2), map(operator.mul, a, b) returns an iterator that yields the results of multiplying each element in a with the element in b with the same index. list() converts the results into a list.
res = [list(map(operator.mul, a, b)) for a, b in zip(arr1, arr2)]
Simple Benchmark
Input
from random import randint
arr1 = [[randint(1, 25) for i in range(1_000)] for j in range(1_000)]
arr2 = [[randint(1, 25) for i in range(1_000)] for j in range(1_000)]
Ordered from fastest to slowest
# Method 3
29.2 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Method 1
44.4 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Method 2
79.3 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# numpy multiplication (inclusive of time required to convert list to array)
81.7 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
We can see that Method 3 (the operator.mul approach) appears fastest and the numpy approach appears the slowest. There is a big caveat, of course, as the numpy timings included the time required to convert the lists to arrays. In order to make meaningful comparisons, we need to specify whether the input and/or output is a list and/or an array. Clearly, if the inputs are already lists and the results must also be lists, then we can be happy with standard Python approaches.
However, if arr1 and arr2 are already numpy arrays, element-wise multiplication is incredibly fast:
1.47 ms ± 5.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
More simpler approach without using any module.
array1 = [[1, 2, 3], [3, 2, 1]]
array2 = [[4, 2, 5], [5, 6, 7]]
result = []
i = 0
while i < len(array1):
sub_array1 = array1[i]
sub_array2 = array2[i]
a, b, c = sub_array1
d, e, f = sub_array2
inner_list = [a * d, b * e, c * f]
result.append(inner_list)
i += 1
print(result)
Output:
[[4,4,15],[15,12,7]]

Logarithm of two dimensional array in Python

I have an array of two dimensional arrays named matrices. Each matrix in there is of dimension 1000 x 1000 and consists of positive values. Now I want to take the log of all values in all the matrices (except for 0). How do I do this easily in python? I have the following code that does what I want, but knowing Python this can be made more brief:
newMatrices = []
for matrix in matrices:
newMaxtrix = []
for row in matrix:
newRow = []
for value in row:
if value > 0:
newRow.append(np.log(value))
else:
newRow.append(value)
newMaxtrix.append(newRow)
newMatrices.append(newMaxtrix)
You can convert it into numpy array and usenumpy.log to calculate the value.
For 0 value, the results will be -Inf. After that you can convert it back to list and replace the -Inf with 0
Or you can use where in numpy
Example:
res = where(arr!= 0, log2(arr), 0)
It will ignore all zero elements.
While #Amadan 's answer is certainly correct (and much shorter/elegant), it may not be the most efficient in your case (depends a bit on the input, of course), because np.where() will generate an integer index for each matching value. A more efficient approach would be to generate a boolean mask. This has two advantages: (1) it is typically more memory efficient (2) the [] operator is typically faster on masks than on integer lists.
To illustrate this, I reimplemented both the np.where()-based and the mask-based solution on a toy input (but with the correct sizes).
I have also included a np.log.at()-based solution which is also quite inefficient.
import numpy as np
def log_matrices_where(matrices):
return [np.where(matrix > 0, np.log(matrix), 0) for matrix in matrices]
def log_matrices_mask(matrices):
arr = np.array(matrices, dtype=float)
mask = arr > 0
arr[mask] = np.log(arr[mask])
arr[~mask] = 0 # if the values are always positive this is not needed
return [x for x in arr]
def log_matrices_at(matrices):
arr = np.array(matrices, dtype=float)
np.log.at(arr, arr > 0)
arr[~(arr > 0)] = 0 # if the values are always positive this is not needed
return [x for x in arr]
N = 1000
matrices = [
np.arange((N * N)).reshape((N, N)) - N
for _ in range(2)]
(some sanity check to make sure we are doing the same thing)
# check that the result is the same
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_mask(matrices))))
# True
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_at(matrices))))
# True
And the timings on my machine:
%timeit log_matrices_where(matrices)
# 33.8 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit log_matrices_mask(matrices)
# 11.9 ms ± 97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit log_matrices_at(matrices)
# 153 ms ± 831 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
EDIT: additionally included np.log.at() solution and a note on zeroing out the values for which log is not defined
Another alternative using numpy:
arr = np.ndarray((1000,1000))
np.log.at(arr, np.nonzero(arr))
As simple as...
import numpy as np
newMatrices = [np.where(matrix != 0, np.log(matrix), 0) for matrix in matrices]
No need to worry about rows and columns, numpy takes care of it. No need to explicitly iterate over matrices in a for loop when a comprehension is readable enough.
EDIT: I just noticed OP had log, not log2. Not really important for the shape of the solution (though likely very important to not getting a wrong answer :P )
as suugested by #R.yan
you can try something like this.
import numpy as np
newMatrices = []
for matrix in matrices:
newMaxtrix = []
for row in matrix:
newRow = []
for value in row:
if value > 0:
newRow.append(np.log(value))
else:
newRow.append(value)
newMaxtrix.append(newRow)
newMatrices.append(newMaxtrix)
newArray = np.asarray(newMatrices)
logVal = np.log(newArray)

Delete a row in numpy.array in numba

It's my first time to post something here. I'm trying to delete a row inside a numpy array inside a numba jitclass. I wrote the following code to remove any row containing 3:
>>> a = np.array([[1,2,3,4],[5,6,7,8]])
>>> a
>>> array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> i = np.where(a==3)
>>> i
>>> (array([0]), array([2]))
I cannot use numpy.delete() function since it is not supported by numba and cannot assign a None type vale to the row. All I could do is to assign 0's to the row by:
>>> a[i[0]] = 0
>>> a
>>> array([[0, 0, 0, 0],
[5, 6, 7, 8]])
But I want to remove the row completely.
Any help will be appreciated.
Thank you very much.
This is in fact not an easy task, since numba has the following restrictions:
no support for np.delete
no support for the axis keyword in np.all and np.any
no support for 2D array indexing (at least not with bool masks)
no or hampered direct creation of bool masks with np.zeros(shape, dtype=np.bool) or similar functions
But still there are several approaches you can take to solve your problem. I tested a few and creating a boolean mask seems to be the fastest and cleanest way.
#nb.njit
def delete_workaround(arr, num):
mask = np.zeros(arr.shape[0], dtype=np.int64) == 0
mask[np.where(arr == num)[0]] = False
return arr[mask]
a = np.array([[1,2,3,4],[5,6,7,8]])
delete_workaround(a, 3)
This solution also has the huge advantage of preserving your array dimensions, even when only one row or an empty array is returned. This is important for jitclasses, since jitclasses rely heavily on fixed dimensions.
Since you request it, I'll show you a solution which converts arrays to lists and back. Since reflected lists are not yet supported with all python methodsin numba, you'll have to use a wrapper for some parts of the function:
#nb.njit
def delete_lrow(arr_list, num):
idx_list = []
for i in range(len(arr_list)):
if (arr_list[i] != num).all():
idx_list.append(i)
res_list = [arr_list[i] for i in idx_list]
return res_list
def wrap_list_del(arr, num):
arr_list = list(arr)
return np.array(delete_lrow(arr_list, num))
arr = np.array([[1,2,3,4],[5,6,7,8],[10,11,5,13],[10,11,3,13],[10,11,99,13]])
arr2 = np.random.randint(0, 256, 100000*4).reshape(-1, 4)
%timeit delete_workaround(arr, 3)
# 1.36 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit wrap_list_del(arr, 3)
# 69.3 µs ± 4.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit delete_workaround(arr2, 3)
# 1.9 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit wrap_list_del(arr2, 3)
# 1.05 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So sticking with arrays if you already have arrays (and even if you don't already have arrays, but your data is of consistent type) is about 50 times faster for small arrays and about 550 times faster for larger arrays.
This is something to remember: Numpy arrays are there for working with numerical data! Numpy is heavily optimized for working with numerical data! There is absolutely no use in converting arrays of numerical data to another "format" if the data type (dtype) is constant and no super-special stuff requires it (I've barely ever encountered such a situation).
And this is especially true for numba optimized code! Numba heavily relies on numpy and constant dtypes/shapes etc. Even more if you want to work with jitclasses.
Welcome to Stacoverflow. You can simply use array slicing to select only rows that they dont have 3 in them.
The code below is a bit elaborate to basically cover extra details for you although you can have a much shorter version with dropping unnecessary lines. The key assignment is rows_final = [x for x in range(a.shape[0]) if x not in rows3]
Code:
import numpy as np
a = np.array([[1,2,3,4],[5,6,7,8],[10,11,3,13]])
ind = np.argwhere(a==3)
rows3 = ind[0]
cols3 = ind[1]
print ("Initial Array: \n", a)
print()
print("rows, cols of a==3 : ", rows3, cols3)
rows_final = [x for x in range(a.shape[0]) if x not in rows3]
a_final = a[rows_final,:]
print()
print ("Final Rows: \n", rows_final)
print ("Final Array: \n", a_final)
Output:
Initial Array:
[[ 1 2 3 4]
[ 5 6 7 8]
[10 11 3 13]]
rows, cols of a==3 : [0 2] [2 2]
Final Rows:
[1]
Final Array:
[[5 6 7 8]]
Numpy delete is now supported in numba (but only first to arguments being array itself and arrays containing indexes that should be deleted)
I think you need to assign your deletion to variable a again, this worked for me. Try the following code:
import numpy as np
a = np.array([[1,2,3,4],[5,6,7,8]])
print(a)
i = np.where(a==3)
a=np.delete(a, i, 0) # assign it back to the variable
print(a)

Performance of map vs starmap?

I was trying to make a pure-python (without external dependencies) element-wise comparison of two sequences. My first solution was:
list(map(operator.eq, seq1, seq2))
Then I found starmap function from itertools, which seemed pretty similar to me. But it turned out to be 37% faster on my computer in worst case. As it was not obvious to me, I measured the time necessary to retrieve 1 element from a generator (don't know if this way is correct):
from operator import eq
from itertools import starmap
seq1 = [1,2,3]*10000
seq2 = [1,2,3]*10000
seq2[-1] = 5
gen1 = map(eq, seq1, seq2))
gen2 = starmap(eq, zip(seq1, seq2))
%timeit -n1000 -r10 next(gen1)
%timeit -n1000 -r10 next(gen2)
271 ns ± 1.26 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)
208 ns ± 1.72 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)
In retrieving elements the second solution is 24% more performant. After that, they both produce the same results for list. But from somewhere we gain extra 13% in time:
%timeit list(map(eq, seq1, seq2))
%timeit list(starmap(eq, zip(seq1, seq2)))
5.24 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.34 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I don't know how to dig deeper in profiling of such nested code? So my question is why the first generator so faster in retrieving and from where we gain extra 13% in list function?
EDIT:
My first intention was to perform element-wise comparison instead of all, so the all function was replaced with list. This replacement does not affect the timing ratio.
CPython 3.6.2 on Windows 10 (64bit)
There are several factors that contribute (in conjunction) to the observed performance difference:
zip re-uses the returned tuple if it has a reference count of 1 when the next __next__ call is made.
map builds a new tuple that is passed to the "mapped function" every time a __next__ call is made. Actually it probably won't create a new tuple from scratch because Python maintains a storage for unused tuples. But in that case map has to find an unused tuple of the right size.
starmap checks if the next item in the iterable is of type tuple and if so it just passes it on.
Calling a C function from within C code with PyObject_Call won't create a new tuple that is passed to the callee.
So starmap with zip will only use one tuple over and over again that is passed to operator.eq thus reducing the function call overhead immensely. map on the other hand will create a new tuple (or fill a C array from CPython 3.6 on) every time operator.eq is called. So what is actually the speed difference is just the tuple creation overhead.
Instead of linking to the source code I'll provide some Cython code that can be used to verify this:
In [1]: %load_ext cython
In [2]: %%cython
...:
...: from cpython.ref cimport Py_DECREF
...:
...: cpdef func(zipper):
...: a = next(zipper)
...: print('a', a)
...: Py_DECREF(a)
...: b = next(zipper)
...: print('a', a)
In [3]: func(zip([1, 2], [1, 2]))
a (1, 1)
a (2, 2)
Yes, tuples aren't really immutable, a simple Py_DECREF was sufficient to "trick" zip into believing noone else holds a reference to the returned tuple!
As for the "tuple-pass-thru":
In [4]: %%cython
...:
...: def func_inner(*args):
...: print(id(args))
...:
...: def func(*args):
...: print(id(args))
...: func_inner(*args)
In [5]: func(1, 2)
1404350461320
1404350461320
So the tuple is passed right through (just because these are defined as C functions!) This doesn't happen for pure Python functions:
In [6]: def func_inner(*args):
...: print(id(args))
...:
...: def func(*args):
...: print(id(args))
...: func_inner(*args)
...:
In [7]: func(1, 2)
1404350436488
1404352833800
Note that it also doesn't happen if the called function isn't a C function even if called from a C function:
In [8]: %%cython
...:
...: def func_inner_c(*args):
...: print(id(args))
...:
...: def func(inner, *args):
...: print(id(args))
...: inner(*args)
...:
In [9]: def func_inner_py(*args):
...: print(id(args))
...:
...:
In [10]: func(func_inner_py, 1, 2)
1404350471944
1404353010184
In [11]: func(func_inner_c, 1, 2)
1404344354824
1404344354824
So there are a lot of "coincidences" leading up to the point that starmap with zip is faster than calling map with multiple arguments when the called function is also a C function...
One difference I can notice is the how map retrieves items from the iterables. Both map and zip create a tuple of iterators from each iterable passed. Now zip maintains a result tuple internally that is populated every time next is called and on the other hand, map creates a new array* with each next call and deallocates it.
*As pointed out by MSeifert till 3.5.4 map_next used to allocate a new Python tuple everytime. This changed in 3.6 and till 5 iterables C stack is used and for anything larger than that heap is used. Related PRs: Issue #27809: map_next() uses fast call and Add _PY_FASTCALL_SMALL_STACK constant | Issue: https://bugs.python.org/issue27809

Categories