How to avoid Kronecker product in NumPy computations - python

Background
After generating a list of random weights:
sizes = [784,30,10]
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
utilize Numpy's Kronecker product to create foo (with shape (900, 23520)):
foo = np.kron(np.identity(30),weights[0])
Then, matrix multiply foo with a slice from data, namely,
bar = np.dot(foo,data[0])
whereby the data[0].shape is (23520,) and data[0].dtype is float32.
Question
foo is rather wasteful. How can weights[0], which has shape (30,784), be utilized for the multiplication with data[0] in a more resourceful manner?
More generally, data[0] is a slice from an array with shape (1666,23520), so the multiplication procedure will need to be carried out 1666 times. Also, the data array is close to sparse with less than 20% of entries being non-zero.
Here's the loop I had tried:
for i in range(len(data)):
foo = np.kron(np.identity(30),weights[0])
bar = np.dot(foo,data[i])

The trick is to reshape data into 3D tensor and then use np.tensordot against weights[0] and thus by-pass foo creation, like so -
k = 30 # kernel size
data3D = data.reshape(data.shape[0],k,-1)
out = np.tensordot(data3D, weights[0], axes=(2,1)).reshape(-1,k**2)
Under the hoods, tensordot uses transposing axes, reshaping and then np.dot. So, using all that manual-labor to avoid the function call to tensordot, we would have one, like so -
out = data.reshape(-1,data.shape[1]//k).dot(weights[0].T).reshape(-1,k**2)
Related post to understand tensordot.
Sample run
Let's use a toy example to explain on what's going on to people who might not have understand the problem :
In [68]: # Toy setup and code run with original codes
...: k = 3 # kernel size, which is 30 in the original case
...:
...: data = np.random.rand(4,6)
...: w0 = np.random.rand(3,2) # this is weights[0]
...: foo = np.kron(np.identity(k), w0)
...: output_first_row = foo.dot(data[0])
So, the question is to get rid of the foo creation step and get to output_first_row and do this for all rows of data.
The proposed solution is :
...: data3D = data.reshape(data.shape[0],k,-1)
...: vectorized_out = np.tensordot(data3D, w0, axes=(2,1)).reshape(-1,k**2)
Let's verify the results :
In [69]: output_first_row
Out[69]: array([ 0.11, 0.13, 0.34, 0.67, 0.53, 1.51, 0.17, 0.16, 0.44])
In [70]: vectorized_out
Out[70]:
array([[ 0.11, 0.13, 0.34, 0.67, 0.53, 1.51, 0.17, 0.16, 0.44],
[ 0.43, 0.23, 0.73, 0.43, 0.38, 1.05, 0.64, 0.49, 1.41],
[ 0.57, 0.45, 1.3 , 0.68, 0.51, 1.48, 0.45, 0.28, 0.85],
[ 0.41, 0.35, 0.98, 0.4 , 0.24, 0.75, 0.22, 0.28, 0.71]])
Runtime test for all proposed approaches -
In [30]: import numpy as np
In [31]: sizes = [784,30,10]
In [32]: weights = [np.random.rand(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
In [33]: data = np.random.rand(1666,23520)
In [37]: k = 30 # kernel size
# #Paul Panzer's soln
In [38]: %timeit (weights[0] # data.reshape(-1, 30, 784).swapaxes(1, 2)).swapaxes(1, 2)
1 loops, best of 3: 707 ms per loop
In [39]: %timeit np.tensordot(data.reshape(data.shape[0],k,-1), weights[0], axes=(2,1)).reshape(-1,k**2)
10 loops, best of 3: 114 ms per loop
In [40]: %timeit data.reshape(-1,data.shape[1]//k).dot(weights[0].T).reshape(-1,k**2)
10 loops, best of 3: 118 ms per loop
This Q&A and the comments under, might help understand how tensordot works better with tensors.

You are essentially doing matrix-matrix multiplication where the first factor is weights[0] and the second is data[i] chopped up into 30 equal slices that form the columns.
import numpy as np
sizes = [784,30,10]
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1],sizes[1:])]
k = 2
# create sparse data
data = np.maximum(np.random.uniform(-100, 1, (k, 23520)), 0)
foo = np.kron(np.identity(30),weights[0])
# This is the original loop as a list comprehension
bar = [np.dot(foo,d) for d in data]
# This is the equivalent using matrix multiplication.
# We can take advantage of the fact that the '#' operator
# can do batch matrix multiplication (it uses the last two
# dimensions as the matrix and all others as batch index).
# The reshape does the chopping up but gives us rows where columns
# are required, hence the first swapaxes.
# The second swapaxes is to make the result directly comparable to
# the `np.kron` based result.
bar2 = (weights[0] # data.reshape(k, 30, 784).swapaxes(1, 2)).swapaxes(1, 2)
# Instead of letting numpy do the batching we can glue all the
# columns of all the second factors together into one matrix
bar3 = (weights[0] # data.reshape(-1, 784).T).T.reshape(k, -1)
# This last formulation works more or less unchanged on sparse data
from scipy import sparse
dsp = sparse.csr_matrix(data.reshape(-1, 784))
bar4 = (weights[0] # dsp.T).T.reshape(k, -1)
print(np.allclose(bar, bar2.reshape(k, -1)))
print(np.allclose(bar, bar3))
print(np.allclose(bar, bar4))
Prints:
True
True
True

Related

Most efficient way of adding elements given the index list in numpy

Assume we have a numpy array A with shape (N, ) and a matrix D with shape (M, 3) which has data and another matrix I with shape (M, 3) which has corresponding index of each data element in D. How can we construct A given D and I such that the repeated element indexes are added?
Example:
############# A[I] := D ###################################
A = [0.5, 0.6] # Final Reduced Data Vector
D = [[0.1, 0.1 0.2], [0.2, 0.4, 0.1]] # Data
I = [[0, 1, 0], [0, 1, 1]] # Indices
For example:
A[0] = D[0][0] + D[0][2] + D[1][0] # 0.5 = 0.1 + 0.2 + 0.2
Since in index matrix we have:
I[0][0] = I[0][2] = I[1][0] = 0
Target is to avoid looping over all elements to be efficient for large N, M (10^6-10^9).
I doubt you can get much faster than np.bincount - and notice how the official documentation provides this exact usecase
# Your example
A = [0.5, 0.6]
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
# Solution
import numpy as np
D, I = np.array(D).flatten(), np.array(I).flatten()
print(np.bincount(I, D)) #[0.5 0.6]
The shape of I and D doesn't matter: you can clearly ravel the arrays without changing the outcome:
index = np.ravel(I)
data = np.ravel(D)
Now you can sort both arrays according to I:
sorter = np.argsort(index)
index = index[sorter]
data = data[sorter]
This is helpful because now index looks like this:
0, 0, 0, 1, 1, 1
And data is this:
0.1, 0.2, 0.2, 0.1, 0.4, 0.1
Adding together runs of consecutive numbers should be easier than processing random locations. Let's start by finding the indices where the runs start:
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
Now you can use the fact that ufuncs like np.add have a partial reduce operation called reduceat. This allows you to sum regions of an array:
a = np.add.reduceat(data, runs)
If I is guaranteed to contain all indices in [0, A.size) at least once, you're done: just assign to A instead of a. If not, you can make the mapping using the fact that the start of each run in index is the target index:
A = np.zeros(n)
A[index[runs]] = a
Algorithmic complexity analysis:
ravel is O(1) in time and space if the data is in an array. If it's a list, this is O(MN) in time and space
argsort is O(MN log MN) in time and O(MN) in space
Indexing by sorter is O(MN) in time and space
Computing runs is O(MN) in time and O(MN + M) = O(MN) in space
reduceat is a single pass: O(MN) in time, O(M) in space
Reassigning A is O(M) in time and space
Total: O(MN log MN) time, O(MN) space
TL;DR
def make_A(D, I, M):
index = np.ravel(I)
data = np.ravel(D)
sorter = np.argsort(index)
index = index[sorter]
if index[0] < 0 or index[-1] >= M:
raise ValueError('Bad indices')
data = data[sorter]
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
a = np.add.reduceat(data, runs)
if a.size == M:
return a
A = np.zeros(M)
A[index[runs]] = a
return A
If you know the size of A beforehand, as it seems you do, you can simply use add.at:
import numpy as np
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
arr_D = np.array(D)
arr_I = np.array(I)
A = np.zeros(2)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
If you don't know the size of A, you can use max to compute it:
A = np.zeros(arr_I.max() + 1)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
The time complexity of this algorithm is O(N), with also space complexity O(N).
The:
arr_I.max() + 1
is what bincount does under the hood, from the documentation:
The result of binning the input array. The length of out is equal to
np.amax(x)+1.
That being said, bincount is at least one order of magnitude faster:
I = np.random.choice(1000, size=(1000, 3), replace=True)
D = np.random.random((1000, 3))
%timeit make_A_with_at(I, D, 1000)
213 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_A_with_bincount(I, D)
11 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Keep receiving Too many indices for array for interpolation

interp - Program to interpolate data using Lagrange
I am not able to complete the for-loop in the coding sequence below. I don't see anything wrong with it, since I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values.
def intrpf(xi,x,y):
"""Function to interpolate between data points
using Lagrange polynomial (quadratic)
Inputs
x Vector of x coordinates of data points (3 values)
y Vector of y coordinates of data points (3 values)
xi The x value where interpolation is computed
Output
yi The interpolation polynomial evaluated at xi
"""
#* Calculate yi = p(xi) using Lagrange polynomial
yi = ( (xi-x[1])*(xi-x[2])/((x[0]-x[1])*(x[0]-x[2])) * y[0]
+ (xi-x[0])*(xi-x[2])/((x[1]-x[0])*(x[1]-x[2])) * y[1]
+ (xi-x[0])*(xi-x[1])/((x[2]-x[0])*(x[2]-x[1])) * y[2] )
return yi
#* Initialize the data points to be fit by quadratic
x = np.empty(3)
y = np.empty(3)
print ('Enter data points as x,y pairs (e.g., [1, 2]')
for i in range(3):
temp = np.array(input('Enter data point: '))
x[i] = temp[0]
y[i] = temp[1]
#* Establish the range of interpolation (from x_min to x_max)
xr = np.array(input('Enter range of x values as [x_min, x_max]: '))
I'm getting stuck on this part, where it seems properly set up, but "Too many indices for array" appears on xi[i] within the for loop.
#* Find yi for the desired interpolation values xi using
# the function intrpf
nplot = 100 # Number of points for interpolation curve
xi = np.empty(nplot)
yi = np.empty(nplot)
for i in range(nplot) :
xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
From the docs of np.array:
Parameters:
object: _array_like_
An array, any object exposing the array interface, an object whose array method returns an array, or any (nested) sequence.
This means array should receive something like a list, in order to make the casting, while input returns a string. What python is trying to do here at the end of the day is something like
np.array('[1, 2]')
While it might be tempting to do something like
np.array(eval(input()))
you should never do this because it is unsafe as it allows the user to execute any kind of code in your program. If you really need that kind of input I would suggest something like
np.array(list(map(int, input('Enter data point: ')
.replace('[','')
.replace(']','')
.split(','))))
The error occurs with your data input lines:
Enter data points as x,y pairs (e.g., [1, 2]
Enter data point: [1,2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-6-8d648ad8c9e4> in <module>
22 for i in range(3):
23 temp = np.array(input('Enter data point: '))
---> 24 x[i] = temp[0]
25 y[i] = temp[1]
26
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
The code doesn't even get to " I choose np.empty(nplot) to create the 1D array for xi, and for some reason the loop won't fill those values." part.
When asking for help, give full and accurate information about the error.
If I change the input lines to:
...: x = np.empty(3)
...: y = np.empty(3)
...: print ('Enter data points as x,y pairs')
...: for i in range(3):
...: temp = input('Enter data point: ').split()
...: x[i] = temp[0]
...: y[i] = temp[1]
...:
...: #* Establish the range of interpolation (from x_min to x_max)
...: xr = np.array(input('Enter range of x values as x_min, x_max: ').split(),float)
Enter data points as x,y pairs
Enter data point: 1 2
Enter data point: 3 4
Enter data point: 5 6
Enter range of x values as x_min, x_max: 0 4
In [9]: x
Out[9]: array([1., 3., 5.])
In [10]: y
Out[10]: array([2., 4., 6.])
In [11]: xr
Out[11]: array([0., 4.])
Getting array values via user input is not ideal, but this at least works. input (in Py3) does not evaluate the inputs; it just returns a string. I split it (on default space), and then assign the values to an array. x is defined as a float array, so the x[i]=temp[0] takes care of converting the string to float. Similarly the xr line makes a float array from the string inputs. This input style is not very robust; I could easily raise an error with wrong input.
===
The rest of the code runs with this input:
In [12]: nplot = 100 # Number of points for interpolation curve
...: xi = np.empty(nplot)
...: yi = np.empty(nplot)
...: for i in range(nplot) :
...: xi[i] = xr[0] + (xr[1]-xr[0])* i/float(nplot)
...: yi[i] = intrpf(xi[i], x, y) # Use intrpf function to interpolate
...:
In [13]: xi
Out[13]:
array([0. , 0.04, 0.08, 0.12, 0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 ,
0.44, 0.48, 0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
...
3.52, 3.56, 3.6 , 3.64, 3.68, 3.72, 3.76, 3.8 , 3.84, 3.88, 3.92,
3.96])
In [14]: yi
Out[14]:
array([1. , 1.04, 1.08, 1.12, 1.16, 1.2 , 1.24, 1.28, 1.32, 1.36, 1.4 ,
1.44, 1.48, 1.52, 1.56, 1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84,
....
4.52, 4.56, 4.6 , 4.64, 4.68, 4.72, 4.76, 4.8 , 4.84, 4.88, 4.92,
4.96])

How to vectorize multiple matrix multiplications in numpy?

For a conceptual idea of what I mean, I have 2 data points:
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
And a 2x2 matrix:
y = np.array([[2, 2], [2, 2]])
If I perform x_0.T # y # x_0, I get array([[ 8.]]). Similarly, x_1.T # y # x_1 returns array([[ 72.]]).
But is there a way to perform both of these calculations in one go, without a for loop? Obviously the speed-up here is negligible, but I am working with much more data points than presented here.
With x as the column stacked version of x_0, x_1 and so on, we can use np.einsum -
np.einsum('ji,jk,ki->i',x,y,x)
With a mix of np.einsum and matrix-multiplcation -
np.einsum('ij,ji->i',x.T.dot(y),x)
As stated earlier, x was assumed to be column-stacked, like so :
x = np.column_stack((x_0, x_1))
Runtime test -
In [236]: x = np.random.randint(0,255,(3,100000))
In [237]: y = np.random.randint(0,255,(3,3))
# Proposed in #titipata's post/comments under this post
In [238]: %timeit (x.T.dot(y)*x.T).sum(1)
100 loops, best of 3: 3.45 ms per loop
# Proposed earlier in this post
In [239]: %timeit np.einsum('ji,jk,ki->i',x,y,x)
1000 loops, best of 3: 832 µs per loop
# Proposed earlier in this post
In [240]: %timeit np.einsum('ij,ji->i',x.T.dot(y),x)
100 loops, best of 3: 2.6 ms per loop
Basically, you want to do the operation (x.T).dot(A).dot(x) for all x that you have.
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
x = np.hstack((x_0, x_1)) # [[ 0.6 2.6], [ 1.4 3.4]]
The easy way to think about it is to do multiplication for all x_i that you have with y as
[x_i.dot(y).dot(x_i) for x_i in x.T]
>> [8.0, 72.0]
But of course this is not too efficient. However, you can do the trick where you can do dot product of x with y first and multiply back with itself and sum over column i.e. you manually do dot product. This will make the calculation much faster:
x = x.T
(x.dot(y) * x).sum(axis=1)
>> array([ 8., 72.])
Note that I transpose the matrix first because we want to multiply column of y to each row of x

Masking array X based on a condition with array Y of different size

I have two 1-dim arrays X and Y of different size. I am trying to build the 2-dim array resulting from a condition on X and Y. For instance:
X = np.array([0.3, 2.1, 4.3])
Y = np.array([1.5, 3.5])
mask = X > Y[:,np.newaxis]
and now I would like to perform something akin X[mask] = X[mask] + 1, so that for the example above it would result in:
newX = np.array([0.3, 3.1, 5.3],[0.3, 2.1, 5.3]])
I managed to get this result by doing:
newX = np.append(X, X).reshape(2,3)
newX[mask]=newX[mask]+1
But this hardcodes the length of the Y array (2 in the example), and includes a copy with np.append which is costly in cases where X and Y are actually large arrays (and it is probably quite ugly as well). Is there a correct way to do this?
In this particular case, where you wish to add 1 wherever mask is True,
perhaps the simplest way is to take advantage of broadcasting and dtype
promotion -- that is, booleans are treated as ints in numeric context.
In [49]: X + mask
Out[49]:
array([[ 0.3, 3.1, 5.3],
[ 0.3, 2.1, 5.3]])
Use broadcasting instead of an explicitly tiled copy of X if possible.
However, if you need newX you could use
In [54]: np.tile(X, (Y.size,1))
Out[54]:
array([[ 0.3, 2.1, 4.3],
[ 0.3, 2.1, 4.3]])
np.tile avoids the multiplications done by np.outer so it is faster for this purpose.
For example, with this setup:
import numpy as np
import timeit
import collections
import matplotlib.pyplot as plt
timing = collections.defaultdict(list)
Ns = np.linspace(10, 10000, 5).astype(int)
Ms = np.linspace(10, 10000, 5).astype(int)
for N, M in zip(Ns, Ms):
X = np.random.random(N)
Y = np.random.random(M)
timing['tile'].append(timeit.timeit(
'np.tile(X, (Y.size,1))',
'from __main__ import np, X, Y',
number=10))
timing['outer'].append(timeit.timeit(
'np.outer(np.ones_like(Y),X)',
'from __main__ import np, X, Y',
number=10))
plt.plot(Ns*Ms, timing['tile'], label='tile')
plt.plot(Ns*Ms, timing['outer'], label='outer')
plt.legend(loc='best')
plt.show()
As the size of the arrays get larger, the difference between tile and outer
should diminish relative to total time because the time required to
allocate/manage large arrays in RAM/swap overwhelms the relatively small
computational costs.

Fastest way to compute entropy of each numpy array row?

I have a array in size MxN and I like to compute the entropy value of each row. What would be the fastest way to do so ?
scipy.special.entr computes -x*log(x) for each element in an array. After calling that, you can sum the rows.
Here's an example. First, create an array p of positive values whose rows sum to 1:
In [23]: np.random.seed(123)
In [24]: x = np.random.rand(3, 10)
In [25]: p = x/x.sum(axis=1, keepdims=True)
In [26]: p
Out[26]:
array([[ 0.12798052, 0.05257987, 0.04168536, 0.1013075 , 0.13220688,
0.07774843, 0.18022149, 0.1258417 , 0.08837421, 0.07205402],
[ 0.08313743, 0.17661773, 0.1062474 , 0.01445742, 0.09642919,
0.17878489, 0.04420998, 0.0425045 , 0.12877228, 0.1288392 ],
[ 0.11793032, 0.15790292, 0.13467074, 0.11358463, 0.13429674,
0.06003561, 0.06725376, 0.0424324 , 0.05459921, 0.11729367]])
In [27]: p.shape
Out[27]: (3, 10)
In [28]: p.sum(axis=1)
Out[28]: array([ 1., 1., 1.])
Now compute the entropy of each row. entr uses the natural logarithm, so to get the base-2 log, divide the result by log(2).
In [29]: from scipy.special import entr
In [30]: entr(p).sum(axis=1)
Out[30]: array([ 2.22208731, 2.14586635, 2.22486581])
In [31]: entr(p).sum(axis=1)/np.log(2)
Out[31]: array([ 3.20579434, 3.09583074, 3.20980287])
If you don't want the dependency on scipy, you can use the explicit formula:
In [32]: (-p*np.log2(p)).sum(axis=1)
Out[32]: array([ 3.20579434, 3.09583074, 3.20980287])
As #Warren pointed out, it's unclear from your question whether you are starting out from an array of probabilities, or from the raw samples themselves. In my answer I've assumed the latter, in which case the main bottleneck will be computing the bin counts over each row.
Assuming that each vector of samples is relatively long, the fastest way to do this will probably be to use np.bincount:
import numpy as np
def entropy(x):
"""
x is assumed to be an (nsignals, nsamples) array containing integers between
0 and n_unique_vals
"""
x = np.atleast_2d(x)
nrows, ncols = x.shape
nbins = x.max() + 1
# count the number of occurrences for each unique integer between 0 and x.max()
# in each row of x
counts = np.vstack((np.bincount(row, minlength=nbins) for row in x))
# divide by number of columns to get the probability of each unique value
p = counts / float(ncols)
# compute Shannon entropy in bits
return -np.sum(p * np.log2(p), axis=1)
Although Warren's method of computing the entropies from the probability values using entr is slightly faster than using the explicit formula, in practice this is likely to represent a tiny fraction of the total runtime compared to the time taken to compute the bin counts.
Test correctness for a single row:
vals = np.arange(3)
prob = np.array([0.1, 0.7, 0.2])
row = np.random.choice(vals, p=prob, size=1000000)
print("theoretical H(x): %.6f, empirical H(x): %.6f" %
(-np.sum(prob * np.log2(prob)), entropy(row)[0]))
# theoretical H(x): 1.156780, empirical H(x): 1.157532
Test speed:
In [1]: %%timeit x = np.random.choice(vals, p=prob, size=(1000, 10000))
....: entropy(x)
....:
10 loops, best of 3: 34.6 ms per loop
If your data don't consist of integer indices between 0 and the number of unique values, you can convert them into this format using np.unique:
y = np.random.choice([2.5, 3.14, 42], p=prob, size=(1000, 10000))
unq, x = np.unique(y, return_inverse=True)
x.shape = y.shape

Categories