Concatenating views in NumPy - python

Indexing NumPy array through slices/indexes creates a view that is lightweight (doesn't copy data) and allows assigning to elements of original array. I.e.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
a[2:4] = [6, 7]
print(a)
# [1 2 6 7 5]
But how about multiple views, how do I concatenate them to create a bigger view that still assigns to original first array. E.g. for imaginary function concatenate_views(...):
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
concatenate_views((a[1:3], a[4:6], a[7:9])) = [11, 12, 13, 14, 15, 16]
print(a)
# should print [1 11 12 4 13 14 7 15 16 10]
Of course, I can create a list of indexes for each view that it views to, just by converting slices to indexes and then concatenating these indexes. This way I'll get all indexes of concatenated views and can use these indexes to create a combined view. But this is not what I want. I want NumPy to keep the notion of slices representation, because all slices can be very long and it will be inefficient to convert and store these slices as indexes. I want NumPy to be aware of underlying slices of all concatenated views to make internally just simple looping of slices ranges.
Also would be nice to generalize the problem. Not only concatenate views, but also allow to form any arbitrary tree of slicing/indexing operations, e.g. concatenate views, then apply some slicing, then indexing, then slicing, then concatenate again. Also N-dimensional slicing/indexing. I.e. all fancy stuff that can be done with single un-concatenated view.
The main point of concatenated views is only efficiency. Of cause we can represent any view or slicing operation by N-D array of integer indexes (coordinates, like meshgrid) and then can use this array to make a view of source array. But if numpy can keep a notion of source set of slices instead of array of integers, then first it will be lightweight (much less memory consumption), second instead of reading indexes from memory numpy internally can loop (iterate) through each slice more efficiently in C++ loops.
By having concatenated view I wanted to be able to apply any numpy operation like np.mean(...) to the combined view in efficient way.
Full procedure of concatenating views of N-D slicing based on 2D example is described down below:
1 Step described below:
2D array slicing using 3 slices for each axis
a,b,c - sizes of "slices" along axis 0
d,e,f - sizes of "slices" along axis 1
Each "slice" - is either slice(start, stop, step) or 1D array of integer indexes
d e f
.......
a.0.1.2.
.......
b.3.4.5.
.......
c.6.7.8.
.......
Above 0 1 2 3 4 5 6 7 8 mean not a single integer but some 2D sub-array.
Dots (`.`) also mean some 2D sub-arrays.
Sub-views shapes:
0:(a, d), 1:(a, e), 2:(a, f)
3:(b, d), 4:(b, e), 5:(b, f)
6:(c, d), 7:(c, e), 8:(c, f)
Final aggregated (concatenated) view shape:
((a + b + c), (d + e + f))
containing 2D array
012
345
678
There can be more than one Steps, each next Step applies new sequence of slicing
to the final view obtained on previous Step. Each Step has different set of sizes
of slices and different amount of slices per each dimension.
In general each next Step reduces number of total elements, except the case
when slices or indexes overlap then you may get more elements but with duplicates.

You can use np.r_ to concatenate slice objects and assign back to the indexed array:
a[np.r_[1:3, 4:6, 7:9]] = [11, 12, 13, 14, 15, 16]
print(a)
array([ 1, 11, 12, 4, 13, 14, 7, 15, 16, 10])
Update
Based on your update, I think you might want something like:
from itertools import islice
it = iter([11, 12, 13, 14, 15, 16])
for s in slice(1,3), slice(4,6), slice(7,9):
a[s] = list(islice(it, s.stop-s.start))
print(a)
array([ 1, 11, 12, 4, 13, 14, 7, 15, 16, 10])

You can only concatenate views if they are contiguous in terms of dtypes, strides and offsets. Here is one way to check. This way is likely incomplete, but it illustrates the gist of it. Basically, if the views share a base, and the strides and offsets are aligned so that they are on the same grid, you can concatenate.
In the spirit of TDD, I will work with the following example:
x = np.arange(24).reshape(4, 6)
We (or at least I) want the following to be concatenatable:
a, b = x[:, :4], x[:, 4:] # Basic case
a, b = x[:, :4:2], x[:, 4::2] # Strided
a, b = x[:, :4:2], x[:, 2::2] # Strided overlapping
a, b = x[1:2, 1:4], x[2:4, 1:4] # Stacked
# Completely reshaped:
a, b = x.ravel()[:12].reshape(3, 4), x.ravel()[12:].reshape(3, 4)
# Equivalent to
a, b = x[:2, :].reshape(3, 4), x[2:, :].reshape(3, 4)
We do not want the following to be concatenatable:
a, b = x, np.arange(12).reshape(2, 6) # Buffer mismatch
a, b = x[0, :].view(np.uint), x[1:, :] # Dtype mismatch
a, b = x[:, ::2], x[:, ::3] # Stride mismatch
a, b = x[:, :4], x[:, 4::2] # Stride mismatch
a, b = x[:, :3], x[:, 4:] # Overlap mismatch
a, b = x[:, :4:2], x[:, 3::2] # Overlap mismatch
a, b = x[:-1, :-1], x[1:, 1:] # Overlap mismatch
a, b = x[:-1, :4], x[:, 4:] # Shape mismatch
The following could be interpreted as concatenatable, but won't be in this case:
a, b = x, x[1:-1, 1:-1]
The idea is that everything (dtype, strides, offsets) has to match exactly. Only one axis offset is allowed to be different between the views, as long as it is no more than one stride away from the edge of the other view. The only possible exception is when one view is fully contained in another, but we will ignore this scenario here. Generalizing to multiple dimensions should be pretty simple if we use array operations on the offsets and strides.
def cat_slices(a, b):
if a.base is not b.base:
raise ValueError('Buffer mismatch')
if a.dtype != b.dtype: # I don't thing you can use `is` here in general
raise ValueError('Dtype mismatch')
sa = np.array(a.strides)
sb = np.array(b.strides)
if (sa != sb).any():
raise ValueError('Stride mismatch')
oa = np.byte_bounds(a)[0]
ob = np.byte_bounds(b)[0]
if oa > ob:
a, b = b, a
oa, ob = ob, oa
offset = ob - oa
# Check if you can get to `b` from a by moving along exactly one axis
# This part works consistently for arrays with internal overlap
div = np.zeros_like(sa)
mod = np.ones_like(sa) # Use ones to auto-flag divide-by zero
np.divmod(offset, sa, where=sa.astype(bool), out=(div, mod))
zeros = np.flatnonzero((mod == 0) & (div >= 0) & (div <= a.shape))
if not zeros.size:
raise ValueError('Overlap mismatch')
axis = zeros[0]
check_shape = np.equal(a.shape, b.shape)
check_shape[axis] = True
if not check_shape.all():
raise ValueError('Shape mismatch')
shape = list(a.shape)
shape[axis] = b.shape[axis] + div[axis]
start = np.byte_bounds(a)[0] - np.byte_bounds(a.base)[0]
return np.ndarray(shape, dtype=a.dtype, buffer=a.base, offset=start, strides=a.strides)
Some things that this function does not handle:
Merging flags
Broadcasting
Handling arrays that are fully contained within each other but with multi-axis offsets
Negative strides
You can, however, check that it returns the expected views (and errors) for all the cases shown above. In a more production-y version, I could envision this enhancing np.concatenate, so for failed cases, it would just copy data instead of raising an error.

Related

PyTorch's torch.as_strided with negative strides for making a Toeplitz matrix

I am writing a jury-rigged PyTorch version of scipy.linalg.toeplitz, which currently has the following form:
def toeplitz_torch(c, r=None):
c = torch.tensor(c).ravel()
if r is None:
r = torch.conj(c)
else:
r = torch.tensor(r).ravel()
# Flip c left to right.
idx = [i for i in range(c.size(0)-1, -1, -1)]
idx = torch.LongTensor(idx)
c = c.index_select(0, idx)
vals = torch.cat((c, r[1:]))
out_shp = len(c), len(r)
n = vals.stride(0)
return torch.as_strided(vals[len(c)-1:], size=out_shp, stride=(-n, n)).copy()
But torch.as_strided currently does not support negative strides. My function, therefore, throws the error:
RuntimeError: as_strided: Negative strides are not supported at the moment, got strides: [-1, 1].
My (perhaps incorrect) understanding of as_strided is that it inserts the values of the first argument into a new array whose size is specified by the second argument and it does so by linearly indexing those values in the original array and placing them at subscript-indexed strides given by the final argument.
Both the NumPy and PyTorch documentation concerning as_strided have scary warnings about using the function with "extreme care" and I don't understand this function fully, so I'd like to ask:
Is my understanding of as_strided correct?
Is there a simple way to rewrite this so negative strides work?
Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
> 1. Is my understanding of as_strided correct?
The stride is an interface for your tensor to access the underlying contiguous data buffer. It does not insert values, no copies of the values are done by torch.as_strided, the strides define the artificial layout of what we refer to as multi-dimensional array (in NumPy) or tensor (in PyTorch).
As Andreas K. puts it in another answer:
Strides are the number of bytes to jump over in the memory in order to get from one item to the next item along each direction/dimension of the array. In other words, it's the byte-separation between consecutive items for each dimension.
Please feel free to read the answers over there if you have some trouble with strides. Here we will take your example and look at how it is implemented with as_strided.
The example given by Scipy for linalg.toeplitz is the following:
>>> toeplitz([1,2,3], [1,4,5,6])
array([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To do so they first construct the list of values (what we can refer to as the underlying values, not actually underlying data): vals which is constructed as [3 2 1 4 5 6], i.e. the Toeplitz column and row flattened.
Now notice the arguments passed to np.lib.stride_tricks.as_strided:
values: vals[len(c)-1:] notice the slice: the tensors show up smaller, yet the underlying values remain, and they correspond to those of vals. Go ahead and compare the two with storage_offset: it's just an offset of 2, the values are still there! How this works is that it essentially shifts the indices such that index=0 will refer to value 1, index=1 to 4, etc...
shape: given by the column/row inputs, here (3, 4). This is the shape of the resulting object.
strides: this is the most important piece: (-n, n), in this case (-1, 1)
The most intuitive thing to do with strides is to describe a mapping between the multi-dimensional space: (i, j) ∈ [0,3[ x [0,4[ and the flattened 1D space: k ∈ [0, 3*4[. Since the strides are equal to (-n, n) = (-1, 1), the mapping is -n*i + n*j = -1*i + 1*j = j-i. Mathematically you can describe your matrix as M[i, j] = F[j-i] where F is the flattened values vector [3 2 1 4 5 6].
For instance, let's try with i=1 and j=2. If you look at the Topleitz matrix above M[1, 2] = 4. Indeed F[k] = F[j-i] = F[1] = 4
If you look closely you will see the trick behind negative strides: they allow you to 'reference' to negative indices: for instance, if you take j=0 and i=2, then you see k=-2. Remember how vals was given with an offset of 2 by slicing vals[len(c)-1:]. If you look at its own underlying data storage it's still [3 2 1 4 5 6], but has an offset. The mapping for vals (in this case i: 1D -> k: 1D) would be M'[i] = F'[k] = F'[i+2] because of the offset. This means M'[-2] = F'[0] = 3.
In the above I defined M' as vals[len(c)-1:] which basically equivalent to the following tensor:
>>> torch.as_strided(vals, size=(len(vals)-2,), stride=(1,), storage_offset=2)
tensor([1, 4, 5, 6])
Similarly, I defined F' as the flattened vector of underlying values: [3 2 1 4 5 6].
The usage of strides is indeed a very clever way to define a Toeplitz matrix!
> 2. Is there a simple way to rewrite this so negative strides work?
The issue is, negative strides are not implemented in PyTorch... I don't believe there is a way around it with torch.as_strided, otherwise it would be rather easy to extend the current implementation and provide support for that feature.
There are however alternative ways to solve the problem. It is entirely possible to construct a Toeplitz matrix in PyTorch, but that won't be with torch.as_strided.
We will do the mapping ourselves: for each element of M indexed by (i, j), we will find out the corresponding index k which is simply j-i. This can be done with ease, first by gathering all (i, j) pairs from M:
>>> i, j = torch.ones(3, 4).nonzero().T
(tensor([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]),
tensor([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]))
Now we essentially have k:
>>> j-i
tensor([ 0, 1, 2, 3, -1, 0, 1, 2, -2, -1, 0, 1])
We just need to construct a flattened tensor of all possible values from the row r and column c inputs. Negative indexed values (the content of c) are put last and flipped:
>>> values = torch.cat((r, c[1:].flip(0)))
tensor([1, 4, 5, 6, 3, 2])
Finally index values with k and reshape:
>>> values[j-i].reshape(3, 4)
tensor([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To sum it up, my proposed implementation would be:
def toeplitz(c, r):
vals = torch.cat((r, c[1:].flip(0)))
shape = len(c), len(r)
i, j = torch.ones(*shape).nonzero().T
return vals[j-i].reshape(*shape)
> 3. Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
That's an interesting question because torch.as_strided doesn't have a backward function implemented. This means you wouldn't have been able to backpropagate to c and r! With the above method, however, which uses 'backward-compatible' builtins, the backward pass comes free of charge.
Notice the grad_fn on the output:
>>> toeplitz(torch.tensor([1.,2.,3.], requires_grad=True),
torch.tensor([1.,4.,5.,6.], requires_grad=True))
tensor([[1., 4., 5., 6.],
[2., 1., 4., 5.],
[3., 2., 1., 4.]], grad_fn=<ViewBackward>)
This was a quick draft (that did take a little while to write down), I will make some edits. If you have some questions or remarks, don't hesitate to comment! I would be interested in seeing other answers as I am not an expert with strides, this is just my take on the problem.

The simplest way to expand a tensor so that it has the same dimension as another tensor

Let flag be a one dimensional vector. Let o1 and o2 be some arrays of more than one dimension; maybe 2 or 4 or something else.
Is there a simpler way to achieve the following thing in numpy?
flag = np.random.randint(0, 2, 10)
# we take o1 and o2 as a four dimensional tensor for an example
o1 = np.ones((10, 2, 4, 4))
o2 = np.zeros((10, 2, 4, 4))
while len(flag.shape) < len(o1.shape):
flag = np.expand_dims(flag, -1)
o = np.where(flag, o1, o2)
print(o)
I think what you have looks fine, you could remove the while loop and add the broadcasted dimensions analytically. This can either be done as my comment or you could use np.reshape, which is slightly more readable.
flag = np.random.randint(0, 2, 10)
o1 = np.ones((10, 2, 4, 4))
o2 = np.zeros((10, 2, 4, 4))
diff = o1.ndim - flag.ndim
flag = flag.reshape(-1, *(1,)*diff)
o = np.where(flag, o1, o2)
You calculate the difference in dimensions between flag and o1. Then add that many dimensions at the end of flag as empty dimensions. A tuple multiplied with a scalar with repeat it diff times and it is then being unpacked as an argument to np.reshape.
To address my comment, the slice notation : is for selecting multiple indices within a range. If you leave it empty it will select all indices and this is equivalent to slice(None, None, None) or slice(None). slice works much like range in regards to its parameter and i am basically doing the same thing as explained above.

Multiplication between arrays of different shape in numpy

I am new on Python and I don't know exactly how to perform multiplication between arrays of different shape.
I have two different arrays w and b such that:
W.shape = [32, 5, 20]
b.shape = [5,]
and I want to multiply
W[:, i, :]*b[i]
for each i from 0 to 4.
How can I do that? Thanks in advance.
You could add a new axis to b so it is multiplied accross W's inner arrays' rows, i.e the second axis:
W * b[:,None]
What you want to do is called Broadcasting. In numpy, you can multiply this way, but only if the shapes match according to some restrictions:
Starting from the right, every component of each arrays' shape must be the equal, 1, or not exist
so right now you have:
W.shape = (32, 5, 20)
b.shape = (5,)
since 20 and 5 don't match, they cant' be broadcast.
If you were to have:
W.shape = (32, 5, 20)
b.shape = (5, 1 )
20 would match with 1 (1 is always ok) and the 5's would match, and you can then multiply them.
To get b's shape to (5, 1), you can either do .reshape(5, 1) (or, more robustly, .reshape(-1, 1)) or fancy index with [:, None]
Thus either of these work:
W * b[:,None] #yatu's answer
W * b.reshape(-1, 1)

how to add list of arrays (tensors)

I am defining a simple conv2d function to calculate the cross-correlation between input and kernel (both 2D tensor) as below:
import torch
def conv2D(X, K):
h = K.shape[0]
w = K.shape[1]
ĥ = X.shape[0] - h + 1
ŵ = X.shape[1] - w + 1
Y = torch.zeros((ĥ, ŵ))
for i in range (ĥ):
for j in range (ŵ):
Y[i, j] = (X[i: i+h, j: j+w]*K).sum()
return Y
When X and K are of rank-3 tensor, I calculate the conv2d for each channel and then add them together as below:
def conv2D_multiple(X, K):
cross = []
result = 0
for x, k in zip(X, K):
cross.append(conv2D(x,k))
for t in cross:
result += t
return result
To test my function:
X_2 = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]], dtype=torch.float32)
K_2 = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]], dtype=torch.float32)
conv2D_multiple(X_2, K_2)
The results is:
tensor([[ 56., 72.],
[104., 120.]])
The result is as expected, however, I believe my second
for loop inside conv2D_multiple(X, K) function is redundant. My question is how to sum (element wise)
tensors (arrays) in the list so I omit the second for loop.
Since your conv2D operates on a per slice behaviour, what you can do is allocate a 3D tensor so that when you use the first for loop, you store the results by taking each result and populating each slice. You can then sum along the dimension of the slices using PyTorch's built-in torch.sum operator on the tensor to get the same result. To make it palatable, I'll make the slice dimension dim=0. Therefore, replace cross from being an initial empty list to a Torch tensor that is 3D to allow you to store the intermediate results, then compress along the slice dimension by summing. We can get away with doing this as your initial implementation stored the intermediate results as a list of 2D tensors. To make it easier, go to 3D and allow PyTorch to sum along the slice axis.
This will require that you define the correct dimensions for this 3D tensor first prior to looping:
def conv2D_multiple(X, K):
h = K.shape[1]
w = K.shape[2]
ĥ = X.shape[1] - h + 1
ŵ = X.shape[2] - w + 1
c = X.shape[0]
cross = torch.zeros((c, ĥ, ŵ), dtype=torch.float32)
for i, (x, k) in enumerate(zip(X, K)):
cross[i] = conv2D(x,k)
result = cross.sum(dim=0)
return result
Notice that for each slice you're iterating over between the input and kernel, instead of appending to a new list we directly place this into a slice in the intermediate tensor. Once you store these results, sum along the slice axis to finally compress it into what you expect. Running the new function above with your example inputs generates the same result.
If this isn't a desired result for you, another way is to simply take the list of tensors you created, build the intermediate tensor out of that by stacking them all together using torch.stack and sum. By default it stacks along the first axis (dim=0):
def conv2D_multiple(X, K):
cross = []
result = 0
for x, k in zip(X, K):
cross.append(conv2D(x,k))
cross = torch.stack(cross)
result = cross.sum(dim=0)
return result

Most efficient way to perform large dot/tensor dot products while only keeping diagonal entries [duplicate]

This question already has answers here:
Matrix multiplication for multidimensional matrix (/array) - how to avoid loop?
(3 answers)
Closed 3 years ago.
I'm trying to figure out a way to use numpy to perform the following algebra in the most time-efficient way possible:
Given a 3D matrix/tensor, A, with shape (n, m, p) and a 2D matrix/tensor, B, with shape (n, p), calculate C_ij = sum_over_k (A_ijk * B_ik), where the resulting matrix C would have dimension (n, m).
I've tried two ways to do this. One is to loop through the first dimension, and calculate a regular dot product each time.
The other method is to use np.tensordot(A, B.T) to calculate a result with shape (n, m, n), and then take the diagonal elements along 1st and 3rd dimension. Both methods are shown below.
First method:
C = np.zeros((n,m))
for i in range(n):
C[i] = np.dot(A[i], B[i])
Second method:
C = np.diagonal(np.tensordot(A, B.T, axes = 1), axis1=0, axis2=2).T
However, because n is a very large number, the loop over n in the first method is costing a lot of time. The second method calculates too many unnecessary entries to obtain that huge (n, m, n)matrix, and is also costing too much time, I'm wondering if there's any efficient way to do this?
Define 2 arrays:
In [168]: A = np.arange(2*3*4).reshape(2,3,4); B = np.arange(2*4).reshape(2,4)
Your iterative approach:
In [169]: [np.dot(a,b) for a,b in zip(A,B)]
Out[169]: [array([14, 38, 62]), array([302, 390, 478])]
The einsum practically writes itself from your C_ij = sum_over_k (A_ijk * B_ik):
In [170]: np.einsum('ijk,ik->ij', A, B)
Out[170]:
array([[ 14, 38, 62],
[302, 390, 478]])
#, matmul, was added to perform batch dot products; here the i dimension is the batch one. Since it uses the last of A and 2nd to the last of B for the dot summation, we have to temporarily expand B to (2,4,1):
In [171]: A#B[...,None]
Out[171]:
array([[[ 14],
[ 38],
[ 62]],
[[302],
[390],
[478]]])
In [172]: (A#B[...,None])[...,0]
Out[172]:
array([[ 14, 38, 62],
[302, 390, 478]])
Typically matmul is fastest, since it passes the task to BLAS like code.
here is my implementation:
B = np.expand_dims(B, axis=1)
E = A * B
E = np.sum(E, axis=-1)
Check :
import numpy as np
n, m, p = 2, 2, 2
np.random.seed(0)
A = np.random.randint(1, 10, (n, m, p))
B = np.random.randint(1, 10, (n, p))
C = np.diagonal(np.tensordot(A, B.T, axes = 1), axis1=0, axis2=2).T
# from here is my implementation
B = np.expand_dims(B, axis=1)
E = A * B
E = np.sum(E, axis=-1)
print(np.array_equal(C, E))
True
use the np.expand_dims() to add a new dimension.
And use the broadcast multiply. Finally, sum along the third dimension.
Thanks check code from user3483203

Categories