It's my first time to post something here. I'm trying to delete a row inside a numpy array inside a numba jitclass. I wrote the following code to remove any row containing 3:
>>> a = np.array([[1,2,3,4],[5,6,7,8]])
>>> a
>>> array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> i = np.where(a==3)
>>> i
>>> (array([0]), array([2]))
I cannot use numpy.delete() function since it is not supported by numba and cannot assign a None type vale to the row. All I could do is to assign 0's to the row by:
>>> a[i[0]] = 0
>>> a
>>> array([[0, 0, 0, 0],
[5, 6, 7, 8]])
But I want to remove the row completely.
Any help will be appreciated.
Thank you very much.
This is in fact not an easy task, since numba has the following restrictions:
no support for np.delete
no support for the axis keyword in np.all and np.any
no support for 2D array indexing (at least not with bool masks)
no or hampered direct creation of bool masks with np.zeros(shape, dtype=np.bool) or similar functions
But still there are several approaches you can take to solve your problem. I tested a few and creating a boolean mask seems to be the fastest and cleanest way.
#nb.njit
def delete_workaround(arr, num):
mask = np.zeros(arr.shape[0], dtype=np.int64) == 0
mask[np.where(arr == num)[0]] = False
return arr[mask]
a = np.array([[1,2,3,4],[5,6,7,8]])
delete_workaround(a, 3)
This solution also has the huge advantage of preserving your array dimensions, even when only one row or an empty array is returned. This is important for jitclasses, since jitclasses rely heavily on fixed dimensions.
Since you request it, I'll show you a solution which converts arrays to lists and back. Since reflected lists are not yet supported with all python methodsin numba, you'll have to use a wrapper for some parts of the function:
#nb.njit
def delete_lrow(arr_list, num):
idx_list = []
for i in range(len(arr_list)):
if (arr_list[i] != num).all():
idx_list.append(i)
res_list = [arr_list[i] for i in idx_list]
return res_list
def wrap_list_del(arr, num):
arr_list = list(arr)
return np.array(delete_lrow(arr_list, num))
arr = np.array([[1,2,3,4],[5,6,7,8],[10,11,5,13],[10,11,3,13],[10,11,99,13]])
arr2 = np.random.randint(0, 256, 100000*4).reshape(-1, 4)
%timeit delete_workaround(arr, 3)
# 1.36 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit wrap_list_del(arr, 3)
# 69.3 µs ± 4.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit delete_workaround(arr2, 3)
# 1.9 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit wrap_list_del(arr2, 3)
# 1.05 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So sticking with arrays if you already have arrays (and even if you don't already have arrays, but your data is of consistent type) is about 50 times faster for small arrays and about 550 times faster for larger arrays.
This is something to remember: Numpy arrays are there for working with numerical data! Numpy is heavily optimized for working with numerical data! There is absolutely no use in converting arrays of numerical data to another "format" if the data type (dtype) is constant and no super-special stuff requires it (I've barely ever encountered such a situation).
And this is especially true for numba optimized code! Numba heavily relies on numpy and constant dtypes/shapes etc. Even more if you want to work with jitclasses.
Welcome to Stacoverflow. You can simply use array slicing to select only rows that they dont have 3 in them.
The code below is a bit elaborate to basically cover extra details for you although you can have a much shorter version with dropping unnecessary lines. The key assignment is rows_final = [x for x in range(a.shape[0]) if x not in rows3]
Code:
import numpy as np
a = np.array([[1,2,3,4],[5,6,7,8],[10,11,3,13]])
ind = np.argwhere(a==3)
rows3 = ind[0]
cols3 = ind[1]
print ("Initial Array: \n", a)
print()
print("rows, cols of a==3 : ", rows3, cols3)
rows_final = [x for x in range(a.shape[0]) if x not in rows3]
a_final = a[rows_final,:]
print()
print ("Final Rows: \n", rows_final)
print ("Final Array: \n", a_final)
Output:
Initial Array:
[[ 1 2 3 4]
[ 5 6 7 8]
[10 11 3 13]]
rows, cols of a==3 : [0 2] [2 2]
Final Rows:
[1]
Final Array:
[[5 6 7 8]]
Numpy delete is now supported in numba (but only first to arguments being array itself and arrays containing indexes that should be deleted)
I think you need to assign your deletion to variable a again, this worked for me. Try the following code:
import numpy as np
a = np.array([[1,2,3,4],[5,6,7,8]])
print(a)
i = np.where(a==3)
a=np.delete(a, i, 0) # assign it back to the variable
print(a)
Related
I have two ndarrays of different shape.
X.shape = (112800, 28, 28)
Y.shape = (112800,)
X is an array of 28x28 grayscale pictures of handwritten numbers and letters (from the enmist balanced dataset)
Y is the array which holds the corresponding labels / classifications for all those pictures in X (values ranging from 0..46)
Now i want to filter both arrays by using np.where(), where Y is < 16 (the filtered array will then only contain numbers 0..9 and uppercase letters A-F, to only look for handwritten hex numbers).
I already managed to filter Y.
Y_hex = np.where(Y < 16)[0] # np.where() returned a tuple containing one element (the filtered list)
For filtering X by the condition Y < 16, i need to parse 2 more arguments to np.where() in order to specify how X is manipulated if the condition is either true or false. However, due to the mismatch in shape i haven't figured out what those arguments should be.
I also managed to filter both in a simple for-loop and adding candidates to new lists, however i am curious to see if it can be done in one line with np.where() and if it will perform better.
Thanks in advance for answers.
This can be easily done without np.where and simply using a boolean array, that I call idx_hex. This array contains True and False, it contains True where Y < 16 and False where Y >= 16.
idx_hex = Y < 16
Y_hex = Y[idx_hex]
X_hex = X[idx_hex]
Let me know if you need a solution explicitly using np.where
Performance
import timeit
X = np.random.random(size=(112800, 28, 28))
Y = np.random.randint(low=0, high=40, size=112800)
%timeit idx_hex = Y < 16 ; Y_hex = Y[idx_hex] ; X_hex = X[idx_hex]
%timeit idx_hex = np.where(Y < 16); Y_hex = Y[idx_hex] ; X_hex = X[idx_hex]
returns
149 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
162 ms ± 5.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So the difference is minimal albeit using np.where is slightly slower.
I have to make this program that multiplies an like this:
The first number of the first list of the first array with the first number of the first list of the second array. For example:
Input
array1 = [[1,2,3], [3,2,1]]
array2 = [[4,2,5], [5,6,7]]
So my output must be:
result = [[4,4,15],[15,12,7]]
So far my code is the following:
def multiplyArrays(array1,array2):
if verifySameSize(array1,array2):
for i in array1:
for j in i:
digitA1 = j
for x in array2:
for a in x:
digitA2 = a
mult = digitA1 * digitA2
return mult
return 'Arrays must be the same size'
It's safe to say it's not working since the result I'm getting for the example I gave is 7 , not even an array, so, what am I doing wrong?
if you want a simple solution, use numpy:
import numpy as np
array1 = np.array([[1,2,3], [3,2,1]])
array2 = np.array([[4,2,5], [5,6,7]])
result = array1 * array2
if you want a general solution for your own understanding, then it becomes a bit harder: how in-depth do you want the implementation to be? there are many checks for example the same sizes, same types, number of dimensions, etc.
the problem in your code is using for each loop instead of indexing. for i in array1 runs twice, returning a list (first [1,2,3] then [3,2,1]). then you do a for each loop in each list returning a number, meaning you only get 1 number as the output which is the result of the last operation (1 * 7 = 7). You should create an empty list and append your results in a normal for loop (not for each).
so your function becomes:
def multiplyArrays(array1,array2):
result = []
for i in range(len(array1)):
result.append([])
for j in range(len(array1[i])):
result[i].append(array1[i][j]*array2[i][j])
return result
this is a bad idea though because it only works with 2D arrays and there are no checks. Avoid writing your own functions unless you absolutely need to.
You can use zip() to iterate over the lists at the same time:
array1 = [[1,2,3], [3,2,1]]
array2 = [[4,2,5], [5,6,7]]
def multiplyArrays(array1,array2):
result = []
for inner1,inner2 in zip(array1,array2):
inner = []
for item1,item2 in zip(inner1,inner2):
inner.append(item1*item2)
result.append(inner)
return result
print(multiplyArrays(array1,array2))
Output as requested.
Here are three pure-Python one-liners that yield your expected output, two of which are simply list comprehension versions of the other two answers. List comprehension equivalents are generally more efficient, but you should choose what is most readable for you.
Method 1
#quamrana's, as a list comprehension.
res = [[a * b for a, b in zip(c, d)] for c, d in zip(arr1, arr2)]
Method 2 #OM222O's, as a list comprehension.
res = [[ arr1[i][j] * arr2[i][j] for j in range(len(arr1[0])) ] for i in range(len(arr1))]
Method 3 Similar to Method 1 but makes use of operator.mul(a, b) (returns a * b) from the operator module and the built-in map(function, iterable, ...) function. The map function "[r]eturn[s] an iterator that applies function to every item of iterable, yielding the results." So given two lists a (from array1) and b (from array2), map(operator.mul, a, b) returns an iterator that yields the results of multiplying each element in a with the element in b with the same index. list() converts the results into a list.
res = [list(map(operator.mul, a, b)) for a, b in zip(arr1, arr2)]
Simple Benchmark
Input
from random import randint
arr1 = [[randint(1, 25) for i in range(1_000)] for j in range(1_000)]
arr2 = [[randint(1, 25) for i in range(1_000)] for j in range(1_000)]
Ordered from fastest to slowest
# Method 3
29.2 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Method 1
44.4 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Method 2
79.3 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# numpy multiplication (inclusive of time required to convert list to array)
81.7 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
We can see that Method 3 (the operator.mul approach) appears fastest and the numpy approach appears the slowest. There is a big caveat, of course, as the numpy timings included the time required to convert the lists to arrays. In order to make meaningful comparisons, we need to specify whether the input and/or output is a list and/or an array. Clearly, if the inputs are already lists and the results must also be lists, then we can be happy with standard Python approaches.
However, if arr1 and arr2 are already numpy arrays, element-wise multiplication is incredibly fast:
1.47 ms ± 5.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
More simpler approach without using any module.
array1 = [[1, 2, 3], [3, 2, 1]]
array2 = [[4, 2, 5], [5, 6, 7]]
result = []
i = 0
while i < len(array1):
sub_array1 = array1[i]
sub_array2 = array2[i]
a, b, c = sub_array1
d, e, f = sub_array2
inner_list = [a * d, b * e, c * f]
result.append(inner_list)
i += 1
print(result)
Output:
[[4,4,15],[15,12,7]]
I have an array of two dimensional arrays named matrices. Each matrix in there is of dimension 1000 x 1000 and consists of positive values. Now I want to take the log of all values in all the matrices (except for 0). How do I do this easily in python? I have the following code that does what I want, but knowing Python this can be made more brief:
newMatrices = []
for matrix in matrices:
newMaxtrix = []
for row in matrix:
newRow = []
for value in row:
if value > 0:
newRow.append(np.log(value))
else:
newRow.append(value)
newMaxtrix.append(newRow)
newMatrices.append(newMaxtrix)
You can convert it into numpy array and usenumpy.log to calculate the value.
For 0 value, the results will be -Inf. After that you can convert it back to list and replace the -Inf with 0
Or you can use where in numpy
Example:
res = where(arr!= 0, log2(arr), 0)
It will ignore all zero elements.
While #Amadan 's answer is certainly correct (and much shorter/elegant), it may not be the most efficient in your case (depends a bit on the input, of course), because np.where() will generate an integer index for each matching value. A more efficient approach would be to generate a boolean mask. This has two advantages: (1) it is typically more memory efficient (2) the [] operator is typically faster on masks than on integer lists.
To illustrate this, I reimplemented both the np.where()-based and the mask-based solution on a toy input (but with the correct sizes).
I have also included a np.log.at()-based solution which is also quite inefficient.
import numpy as np
def log_matrices_where(matrices):
return [np.where(matrix > 0, np.log(matrix), 0) for matrix in matrices]
def log_matrices_mask(matrices):
arr = np.array(matrices, dtype=float)
mask = arr > 0
arr[mask] = np.log(arr[mask])
arr[~mask] = 0 # if the values are always positive this is not needed
return [x for x in arr]
def log_matrices_at(matrices):
arr = np.array(matrices, dtype=float)
np.log.at(arr, arr > 0)
arr[~(arr > 0)] = 0 # if the values are always positive this is not needed
return [x for x in arr]
N = 1000
matrices = [
np.arange((N * N)).reshape((N, N)) - N
for _ in range(2)]
(some sanity check to make sure we are doing the same thing)
# check that the result is the same
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_mask(matrices))))
# True
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_at(matrices))))
# True
And the timings on my machine:
%timeit log_matrices_where(matrices)
# 33.8 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit log_matrices_mask(matrices)
# 11.9 ms ± 97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit log_matrices_at(matrices)
# 153 ms ± 831 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
EDIT: additionally included np.log.at() solution and a note on zeroing out the values for which log is not defined
Another alternative using numpy:
arr = np.ndarray((1000,1000))
np.log.at(arr, np.nonzero(arr))
As simple as...
import numpy as np
newMatrices = [np.where(matrix != 0, np.log(matrix), 0) for matrix in matrices]
No need to worry about rows and columns, numpy takes care of it. No need to explicitly iterate over matrices in a for loop when a comprehension is readable enough.
EDIT: I just noticed OP had log, not log2. Not really important for the shape of the solution (though likely very important to not getting a wrong answer :P )
as suugested by #R.yan
you can try something like this.
import numpy as np
newMatrices = []
for matrix in matrices:
newMaxtrix = []
for row in matrix:
newRow = []
for value in row:
if value > 0:
newRow.append(np.log(value))
else:
newRow.append(value)
newMaxtrix.append(newRow)
newMatrices.append(newMaxtrix)
newArray = np.asarray(newMatrices)
logVal = np.log(newArray)
I have an array of objects. I also have a function that requires information from 2 of the objects at a time. I would like to vectorize the call to the function so that it calculates all calls at once, rather than using a loop to go through the necessary pair of objects.
I have gotten this to work if I instead create an array with the necessary data. However this partially defeats the purpose of using objects.
Here is the code. It currently works using the array method and only one line needs to be commented/uncommented in the function to switch to the "object" mode that does not work, but I dearly wish would.
The error I get is: TypeError: only integer arrays with one element can be converted to an index
import numpy as np
import time as time
class ExampleObject():
def __init__(self, r):
self.r = r
def ExampleFunction(x):
""" WHAT I REALLY WANT """
# answer = exampleList[x].r - exampleList[indexArray].r
"""WHAT I AM STUCK WITH """
answer = coords[x] - exampleList[indexArray].r
return answer
indexArray = 5 #arbitrary choice of array index
sizeArray = 1000
exampleList = []
for i in range(sizeArray):
r = np.random.rand()
exampleList.append( ExampleObject( r ) )
index_list = np.arange(0,sizeArray,1)
index_list = np.delete(index_list,indexArray)
coords = np.array([h.r for h in exampleList])
answerArray = ExampleFunction(index_list)
The issue is that when I pass the function an array of integers, it doesn't return an array of answers (the vectorization I want) when I use the array (actually, list) of objects. It does work if I use an array (with no objects, just data in each element). But as I have said, this defeats in my mind, the purpose of storing information on objects to begin with. Do I really need to ALSO store the same information in arrays?
I can't comment, sorry for misusing the answer section...
If the data type of a numpy array is python object, the memory of the numpy array is not contiguous. Vectorization of the operation may not improve the performance much if any. Perhaps you might want to try numpy structured array instead.
assume the object has attributes a & b and they are double precision floating point number, then...
import numpy as np
numberOfObjects = 6
myStructuredArray = np.zeros(
(numberOfObjects,),
[("a", "f8"), ("b", "f8")],
)
you can initialize individual attributes for say object 0 like this
myStructuredArray["a"][0] = 1.0
or you can initialize individual attributes for all objects like this
myStructuredArray["a"] = [1,2,3,4,5,6]
print(myStructuredArray)
[(1., 0.) (2., 0.) (3., 0.) (4., 0.) (5., 0.) (6., 0.)]
numpy.ufunc when given an object dtype array, iterate through the array, and try to apply a cooresponding method to each element.
For example np.abs tries to apply the __abs__ method. Lets add such a method to your class:
In [31]: class ExampleObject():
...:
...: def __init__(self, r):
...: self.r = r
...: def __abs__(self):
...: return self.r
...:
Now create your arrays:
In [32]: indexArray = 5 #arbitrary choice of array index
...: sizeArray = 10
...:
...: exampleList = []
...: for i in range(sizeArray):
...: r = np.random.rand()
...: exampleList.append( ExampleObject( r ) )
...:
...: index_list = np.arange(0,sizeArray,1)
...: index_list = np.delete(index_list,indexArray)
...:
...: coords = np.array([h.r for h in exampleList])
and make an object dtype array from the list:
In [33]: exampleArr = np.array(exampleList)
In [34]: exampleArr
Out[34]:
array([<__main__.ExampleObject object at 0x7fbb541eb9b0>,
<__main__.ExampleObject object at 0x7fbb541eba90>,
<__main__.ExampleObject object at 0x7fbb541eb3c8>,
<__main__.ExampleObject object at 0x7fbb541eb978>,
<__main__.ExampleObject object at 0x7fbb541eb208>,
<__main__.ExampleObject object at 0x7fbb541eb128>,
<__main__.ExampleObject object at 0x7fbb541eb198>,
<__main__.ExampleObject object at 0x7fbb541eb358>,
<__main__.ExampleObject object at 0x7fbb541eb4e0>,
<__main__.ExampleObject object at 0x7fbb541eb048>], dtype=object)
Now we can get an array of the r values by calling the np.abs function:
In [35]: np.abs(exampleArr)
Out[35]:
array([0.28411876298913485, 0.5807617042932764, 0.30566195995294954,
0.39564156171554554, 0.28951905026871105, 0.5500945908978057,
0.40908712567465855, 0.6469497088949425, 0.7480045751535003,
0.710425181488751], dtype=object)
It also works with indexed elements of the array:
In [36]: np.abs(exampleArr[:3])
Out[36]:
array([0.28411876298913485, 0.5807617042932764, 0.30566195995294954],
dtype=object)
This is convenient, but I can't promise speed. In other tests I found that iteration over object dtypes is faster than iteration (in Python) over numeric array elements, but slower than list iteration.
In [37]: timeit np.abs(exampleArr)
3.61 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [38]: timeit [h.r for h in exampleList]
985 ns ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [39]: timeit np.array([h.r for h in exampleList])
3.55 µs ± 88.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
First sorry for my not perfect English.
My problem is simple to explain, I think.
result={}
list_tuple=[(float,float,float),(float,float,float),(float,float,float)...]#200k tuples
threshold=[float,float,float...] #max 1k values
for tuple in list_tuple:
for value in threeshold:
if max(tuple)>value and min(tuple)<value:
if value in result:
result[value].append(tuple)
else:
result[value]=[]
result[value].append(tuple)
list_tuple contains arround 200k tuples, i have to do this operation very fast(2/3 seconds max on a normal pc).
My first attemp was to do this in cython with prange() (so i could have benefits from the cython optimization and from the paralell execution), but the problem is (as always), GIL: in prange() i can manage lists and tuples using cython memviews, but i can't insert my result in a dict.
In cython i also tried using unordered_map of the c++ std, but now the problem is that i can't make a vector of array in c++ (that would the value of my dict).
The second problem is similar:
list_tuple=[((float,float),(float,float)),((float,float),(float,float))...]#200k tuples of tuples
result={list_tuple[0][0]:[]}
for tuple in list_tuple:
if tuple[0] in result:
result[tuple[0]].append(tuple)
else:
result[tuple[0]]=[]
Here i have also another problem,if a want to use prange() i have to use a custom hash function to use an array as key of a c++ unordered_map
As you can see my snippets are very simple to run in paralell.
I thought to try with numba, but probably will be the same because of GIL, and i prefer to use cython because i need a binary code (this library could be a part of a commercial software so only binary libraries are allowed).
In general i would like avoid c/c++ function, what i hope to find is a way to manage something like dicts/lists in parallel,with the cython performance, remaining as much as possible in the Python domain; but i'm open to every advice.
Thanks
Several performance improvements can be achieved, also by using numpy's vectorization features:
The min and max values are currently computed anew for each threshold. Instead they can be precomputed and then reused for each threshold.
The loop over data samples (list_tuple) is performed in pure Python. This loop can be vectorized using numpy.
In the following tests I used data.shape == (200000, 3); thresh.shape == (1000,) as indicated in the OP. I also omitted modifications to the result dict since depending on the data this can quickly overflow memory.
Applying 1.
v_min = [min(t) for t in data]
v_max = [max(t) for t in data]
for mi, ma in zip(v_min, v_max):
for value in thresh:
if ma > value and mi < value:
pass
This yields a performance increase of ~ 5 compared to the OP's code.
Applying 1. & 2.
v_min = data.min(axis=1)
v_max = data.max(axis=1)
mask = np.empty(shape=(data.shape[0],), dtype=bool)
for t in thresh:
mask[:] = (v_min < t) & (v_max > t)
samples = data[mask]
if samples.size > 0:
pass
This yields a performance increase of ~ 30 compared to the OP's code. This approach has the additional benefit that it doesn't contain incremental appends to the lists which can slow down the program since memory reallocation might be required. Instead it creates each list (per threshold) in a single attempt.
#a_guest's code:
def foo1(data, thresh):
data = np.asarray(data)
thresh = np.asarray(thresh)
condition = (
(data.min(axis=1)[:, None] < thresh)
& (data.max(axis=1)[:, None] > thresh)
)
result = {v: data[c].tolist() for c, v in zip(condition.T, thresh)}
return result
This code creates a dictionary entry once for each item in thresh.
The OP code, simplified a bit with default_dict (from collections):
def foo3(list_tuple, threeshold):
result = defaultdict(list)
for tuple in list_tuple:
for value in threeshold:
if max(tuple)>value and min(tuple)<value:
result[value].append(tuple)
return result
This one updates a dictionary entry once for each item that meets the criteria.
And with his sample data:
In [27]: foo1(data,thresh)
Out[27]: {0: [], 1: [[0, 1, 2]], 2: [], 3: [], 4: [[3, 4, 5]]}
In [28]: foo3(data.tolist(), thresh.tolist())
Out[28]: defaultdict(list, {1: [[0, 1, 2]], 4: [[3, 4, 5]]})
time tests:
In [29]: timeit foo1(data,thresh)
66.1 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# In [30]: timeit foo3(data,thresh)
# 161 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [31]: timeit foo3(data.tolist(),thresh.tolist())
30.8 µs ± 56.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Iteration on arrays is slower than with lists. Time for tolist() is minimal; np.asarray for lists is longer.
With a larger data sample, the array version is faster:
In [42]: data = np.random.randint(0,50,(3000,3))
...: thresh = np.arange(50)
In [43]:
In [43]: timeit foo1(data,thresh)
16 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [44]: %%timeit x,y = data.tolist(), thresh.tolist()
...: foo3(x,y)
...:
83.6 ms ± 68.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Edit
Since this approach basically performs an outer product between data samples and threshold values it increases the required memory significantly which might be undesired. An improved approach can be found here. I keep this answer nevertheless for future reference since it was referred to in this answer.
I found the performance increase as compared to the OP's code to be a factor of ~ 20.
This is an example using numpy. The data is vectorized and so are the operations. Note that the resulting dict contains empty lists, as opposed to the OP's example, and hence might require an additional cleaning step, if appropriate.
import numpy as np
# Data setup
data = np.random.uniform(size=(200000, 3))
thresh = np.random.uniform(size=1000)
# Compute tuples for thresholds.
condition = (
(data.min(axis=1)[:, None] < thresh)
& (data.max(axis=1)[:, None] > thresh)
)
result = {v: data[c].tolist() for c, v in zip(condition.T, thresh)}