I have a data of shape d X N (each column is a vector of features)
I have this code for calculating the kernel matrix:
def kernel(x1, x2):
return x1.T # x2
data = np.array([[1,2,3], [1,2,3], [1,2,3]])
result = []
for i in range(data.shape[1]):
current_result = []
for j in range(data.shape[1]):
x1 = data[:, i]
x2 = data[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
np.array(result)
and I am getting this result:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
The problem is that this code is too slow, so I tried to use np.vectorize:
vec = np.vectorize(kernel, signature='(n),(n)->()')
vec(data, data)
But I am getting the wrong result:
array([14, 14, 14])
what am I doing wrong?
When tested for bigger dimensions of your problem, and random numbers to ensure the robustness, for instance with dimensions (100,200), there are several ways:
import numpy as np
def kernel(x1, x2):
return x1.T # x2
def kernel_kenny(a):
result = []
for i in range(a.shape[1]):
current_result = []
for j in range(a.shape[1]):
x1 = a[:, i]
x2 = a[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
return np.array(result)
a = np.random.random((100,200))
res1 = kernel_kenny(a)
# perhaps einsum signature might help you to understand the calculations
res2 = np.einsum('ji,jk->ik', a, a, optimize=True)
# or the following if you want to explicitly specify the transpose
# res2 = np.einsum('ij,jk->ik', a.T, a, optimize=True)
# or simply ...
res3 = a.T # a
Hera are the sanity checks:
np.allclose(res1,res2)
>>> True
np.allclose(res1,res3)
>>> True
and timings:
%timeit kernel_kenny(a)
>>> 83.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ji,jk->ik', a, a, optimize=True)
>>> 325 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a.T # a
>>> 82 µs ± 9.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Related
I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True
I have a massive array with rows and columns. Some rows are larger than others. I need to get the max length row, that is, the row that has the highest length. I wrote a simple function for this, but I wanted it to be as fas as possible, like numpy fast. Currently, it looks like this:
Example array:
values = [
[1,2,3],
[4,5,6,7,8,9],
[10,11,12,13]
]
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
Is there any way to accomplish this with numpy?
In [261]: values = [
...: [1,2,3],
...: [4,5,6,7,8,9],
...: [10,11,12,13]
...: ]
...:
In [262]:
In [262]: values
Out[262]: [[1, 2, 3], [4, 5, 6, 7, 8, 9], [10, 11, 12, 13]]
In [263]: def values_max_width(values):
...: max_width = 1
...: for row in values:
...: if len(row) > max_width:
...: max_width = len(row)
...: return max_width
...:
In [264]: values_max_width(values)
Out[264]: 6
In [265]: [len(v) for v in values]
Out[265]: [3, 6, 4]
In [266]: max([len(v) for v in values])
Out[266]: 6
In [267]: np.max([len(v) for v in values])
Out[267]: 6
Your loop and the list comprehension are similar in speed, np.max is much slower - it has to first turn the list into an array.
In [268]: timeit max([len(v) for v in values])
656 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [269]: timeit np.max([len(v) for v in values])
13.9 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [271]: timeit values_max_width(values)
555 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you are starting with a list, it's a good idea to thoroughly test the list implementation. numpy is fast when it is doing compiled array stuff, but creating an array from a list is time consuming.
Making an array directly from values isn't much help. The result in a object dtype array:
In [272]: arr = np.array(values)
In [273]: arr
Out[273]:
array([list([1, 2, 3]), list([4, 5, 6, 7, 8, 9]), list([10, 11, 12, 13])],
dtype=object)
Math on such an array is hit-or-miss, and always slower than math on pure numeric arrays. We can iterate on such an array, but that iteration is slower than on a list.
In [275]: values_max_width(arr)
Out[275]: 6
In [276]: timeit values_max_width(arr)
1.3 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Not sure how you can make it faster. I've tried using np.max over the length of each item, but that will take even longer:
import numpy as np
import time
values = []
for k in range(100000):
values.append(list(np.random.randint(100, size=np.random.randint(1000))))
def timeit(func):
def wrapper(*args, **kwargs):
now = time.time()
retval = func(*args, **kwargs)
print('{} took {:.5f}s'.format(func.__name__, time.time() - now))
return retval
return wrapper
#timeit
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
#timeit
def value_max_width_len(values):
return np.max([len(l) for l in values])
values_max_width(values)
value_max_width_len(values)
values_max_width took 0.00598s
value_max_width_len took 0.00994s
* Edit *
As #Mstaino suggested, using map does make this code faster:
#timeit
def value_max_width_len(values):
return max(map(len, values))
values_max_width took 0.00598s
value_max_width_len took 0.00499s
I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])
Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]
I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I have written a code to do matrix multiplication of different range but it takes a lot of time to exexcute the code,
code:
Program to multiply two matrices using nested loops
import time
print("Enter the size of matrix A")
m = int(input())
n = int(input())
print("Enter the size of matrix A")
p = int(input())
q = int(input())
if(n==p):
print('enter matrix A')
else:
print("invalid entry")
exit()
our_list1 = []
A = []
i = 0
int(i)
for i in range(m):
for i in range(n):
number = int(input('Please enter a element '))
our_list1.append(number)
A.append(our_list1)
our_list1= []
print(A)
print('enter matrix B')
our_list1 = []
B = []
for i in range(p):
for i in range(q):
number = int(input('Please enter a element '))
our_list1.append(number)
B.append(our_list1)
our_list1= []
print(B)
start_time = time.time()
#
our_list1 = []
R = []
for i in range(m):
for i in range(q):
number = 0
our_list1.append(number)
R.append(our_list1)
our_list1= []
print(R)
for i in range(len(A)):
# iterating by coloum by B
for j in range(len(B[0])):
# iterating by rows of B
for k in range(len(B)):
R[i][j] += A[i][k] * B[k][j]
print(R)
print("--- %s seconds ---" % (time.time() - start_time))
It takes more time to execute this method of matrix multiplication, how can I choose the efficient way of matrix multiplication of huge dimension range? So higher dimension array can be executed smoothly and quickly.
Sample output:
Matrix A[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
Matrix B[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[27, 27, 27], [27, 27, 27], [27, 27, 27]]
--- 0.00014400482177734375 seconds ---
it takes 0.00014400482177734375 seconds can I improve this timmings when I do for higher dimension multiplication?
This timings in your comments have some significant drawbacks:
print() is comparatively expensive and has nothing to do with the calculation. Including it in the timings could take up a big chunk of the overall time.
Using wallclock (time.time()) is not a good way of getting stable timings; you get one run and anything could be happening on your system.
This should give a better test case for comparison:
import numpy as np
def python_lists():
A = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
B = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
our_list1 = []
R = []
for i in range(3):
for i in range(3):
number = 0
our_list1.append(number)
R.append(our_list1)
our_list1= []
for i in range(len(A)):
# iterating by coloum by B
for j in range(len(B[0])):
# iterating by rows of B
for k in range(len(B)):
R[i][j] += A[i][k] * B[k][j]
def numpy_array():
A = np.full((3, 3), 3)
B = np.full((3, 3), 3)
result = np.dot(A, B)
And the timings:
%timeit python_lists()
15 µs ± 45.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit numpy_array()
5.57 µs ± 44.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So, NumPy is ~3 times faster for this example. But this would be more significant if you had bigger arrays.
EDIT:
And actually, you could argue that creating A and B inside the function is not helpful for timing the actual matrix multiplication, so if I instead create the lists/arrays first and pass them, the new timings are:
%timeit python_lists(A, B)
14.4 µs ± 98.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit numpy_array(A, B)
1.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
And, for the sake of completeness, for an array with shape (200, 200):
%timeit python_lists()
6.99 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numpy_array()
5.77 ms ± 43.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])
Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]
I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)