Both .flatten() and .view(-1) flatten a tensor in PyTorch. What's the difference?
Does .flatten() copy the data of the tensor?
Is .view(-1) faster?
Is there any situation that .flatten() doesn't work?
In addition to #adeelh's comment, there is another difference: torch.flatten() results in a .reshape(), and the differences between .reshape() and .view() are:
[...] torch.reshape may return a copy or a view of the original tensor. You can not count on that to return a view or a copy.
Another difference is that reshape() can operate on both contiguous and non-contiguous tensor while view() can only operate on contiguous tensor. Also see here about the meaning of contiguous
For context:
The community requested for a flatten function for a while, and after Issue #7743, the feature was implemented in the PR #8578.
You can see the implementation of flatten here, where a call to .reshape() can be seen in return line.
flatten is simply a convenient alias of a common use-case of view.1
There are several others:
Function
Equivalent view logic
flatten()
view(-1)
flatten(start, end)
view(*t.shape[:start], -1, *t.shape[end+1:])
squeeze()
view(*[s for s in t.shape if s != 1])
unsqueeze(i)
view(*t.shape[:i-1], 1, *t.shape[i:])
Note that flatten allows you to flatten a specific contiguous subset of dimensions, with the start_dim and end_dim arguments.
Actually the superficially equivalent reshape under the hood.
First of all, .view() works only on contiguous data, while .flatten() works on both contiguous and non contiguous data. Functions like transpose whcih generates non-contiguous data, can be acted upon by .flatten() but not .view().Coming to copying of data, both .view() and .flatten() does not copy data when it works on contiguous data. However, in case of non-contiguous data, .flatten() first copies data into contiguous memory and then change the dimensions. Any change in the new tensor would not affect th original tensor.
ten=torch.zeros(2,3)
ten_view=ten.view(-1)
ten_view[0]=123
ten
>>tensor([[123., 0., 0.],
[ 0., 0., 0.]])
ten=torch.zeros(2,3)
ten_flat=ten.flatten()
ten_flat[0]=123
ten
>>tensor([[123., 0., 0.],
[ 0., 0., 0.]])
In the above code, the tensor ten have contiguous memory allocation. Any changes to ten_view or ten_flat is reflected upon tensor ten
ten=torch.zeros(2,3).transpose(0,1)
ten_flat=ten.flatten()
ten_flat[0]=123
ten
>>tensor([[0., 0.],
[0., 0.],
[0., 0.]])
In this case non-contiguous transposed tensor ten is used for flatten(). Any changes made to ten_flat is not reflected upon ten.
Related
I am looking for a way to find a 2D pattern in a MxNxR tensor/array with pytorch or numpy.
For instance, to see if a dictionary of tensor of boolean pattern (e.g. {6x6 : freq}) exist in a larger boolean tensor (e.g. 3x256x256).
Then I want to update my patterns and frequencies of the dictionary.
I was hoping that there was a pytorchi way of doing it, instead of having loops over it, or have an optimized loop for doing it.
As far as I know, torch.where works when we have a scalar value. I’m not sure how should I do, if I have a tensor of 6x6 instead of a value.
I looked into Finding Patterns in a Numpy Array , but I don't think that it's feasible to follow it for a 2D pattern.
I'm thinking maybe you can pull this off using convolutions. Let's imagine you have an input made up of 0 and 1. Here we will take a minimal example with an u=input of 3x3 and a 2x2 pattern:
>>> x = torch.tensor([[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.]])
And the pattern would be:
>>> pattern = torch.tensor([[1., 0.],
[0., 1.]])
Here the pattern can be found in the upper left corner of the input.
We perform a convolution with nn.functional.conv2d with 1 - pattern as the kernel.
>>> img, mask = x[None, None], pattern[None, None]
>>> M = F.conv2d(img, 1 - mask)
tensor([[[[0., 1.],
[2., 0.]]]])
There is a match if and only if the result is equal to the number of 1s in the pattern:
>>> M == mask.sum(dim=(2,3)))
tensor([[[[ True, False],
[False, False]]]])
You can deduce the frequencies from this final boolean mask. You can extend this method to multiple patterns by adding in kernels in your convolution.
I'm following an example in the book "Deep Learning with Python" by Francois Chollet.
There's an example (pg 70) where they convert an array of int's to an array of float32
The relevant lines are
from keras.datasets import imdb
(tr_data, tr_labels), (ts_data, ts_labels) = imdb.load_data(num_words=10000)
...
import numpy as np
y_train = np.asarray(tr_labels).astype('float32')
tr_labels is simply an array of ints
array([1, 0, 0, ..., 0, 1, 0])
y_train is an array of float32
array([1., 0., 0., ..., 0., 1., 0.], dtype=float32)
But why do we need to call np.asarray() when simply this seems to do the trick
y_train = tr_labels.astype('float32')
Just wondering if numpy.asarray() does some additional data processing I'm not aware of.
No, it's not necessary.
np.asarray is sometimes useful if you aren't sure what the datatype is (or if it can change), and it won't make a copy into a new array if the input is already an ndarray, so it shouldn't be a slowdown if tr_labels is already an array. Along a similar vein, if you want to allow subclasses of ndarray you can use np.asanyarray which will pass through any subclass of ndarray (such as sparse arrays, etc.) without extra copying. These are just two examples of the many array creation functions numpy provides from existing data. There are often multiple right answers, but sometimes one may be more efficient (memory wise) than another.
What is the difference between NumPy's np.array and np.asarray? When should I use one rather than the other? They seem to generate identical output.
The definition of asarray is:
def asarray(a, dtype=None, order=None):
return array(a, dtype, copy=False, order=order)
So it is like array, except it has fewer options, and copy=False. array has copy=True by default.
The main difference is that array (by default) will make a copy of the object, while asarray will not unless necessary.
Since other questions are being redirected to this one which ask about asanyarray or other array creation routines, it's probably worth having a brief summary of what each of them does.
The differences are mainly about when to return the input unchanged, as opposed to making a new array as a copy.
array offers a wide variety of options (most of the other functions are thin wrappers around it), including flags to determine when to copy. A full explanation would take just as long as the docs (see Array Creation, but briefly, here are some examples:
Assume a is an ndarray, and m is a matrix, and they both have a dtype of float32:
np.array(a) and np.array(m) will copy both, because that's the default behavior.
np.array(a, copy=False) and np.array(m, copy=False) will copy m but not a, because m is not an ndarray.
np.array(a, copy=False, subok=True) and np.array(m, copy=False, subok=True) will copy neither, because m is a matrix, which is a subclass of ndarray.
np.array(a, dtype=int, copy=False, subok=True) will copy both, because the dtype is not compatible.
Most of the other functions are thin wrappers around array that control when copying happens:
asarray: The input will be returned uncopied iff it's a compatible ndarray (copy=False).
asanyarray: The input will be returned uncopied iff it's a compatible ndarray or subclass like matrix (copy=False, subok=True).
ascontiguousarray: The input will be returned uncopied iff it's a compatible ndarray in contiguous C order (copy=False, order='C').
asfortranarray: The input will be returned uncopied iff it's a compatible ndarray in contiguous Fortran order (copy=False, order='F').
require: The input will be returned uncopied iff it's compatible with the specified requirements string.
copy: The input is always copied.
fromiter: The input is treated as an iterable (so, e.g., you can construct an array from an iterator's elements, instead of an object array with the iterator); always copied.
There are also convenience functions, like asarray_chkfinite (same copying rules as asarray, but raises ValueError if there are any nan or inf values), and constructors for subclasses like matrix or for special cases like record arrays, and of course the actual ndarray constructor (which lets you create an array directly out of strides over a buffer).
The difference can be demonstrated by this example:
Generate a matrix.
>>> A = numpy.matrix(numpy.ones((3, 3)))
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
Use numpy.array to modify A. Doesn't work because you are modifying a copy.
>>> numpy.array(A)[2] = 2
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
Use numpy.asarray to modify A. It worked because you are modifying A itself.
>>> numpy.asarray(A)[2] = 2
>>> A
matrix([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
The differences are mentioned quite clearly in the documentation of array and asarray. The differences lie in the argument list and hence the action of the function depending on those parameters.
The function definitions are :
numpy.array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
and
numpy.asarray(a, dtype=None, order=None)
The following arguments are those that may be passed to array and not asarray as mentioned in the documentation :
copy : bool, optional If true (default), then the object is copied.
Otherwise, a copy will only be made if __array__ returns a copy, if
obj is a nested sequence, or if a copy is needed to satisfy any of the
other requirements (dtype, order, etc.).
subok : bool, optional If True, then sub-classes will be
passed-through, otherwise the returned array will be forced to be a
base-class array (default).
ndmin : int, optional Specifies the minimum number of dimensions that
the resulting array should have. Ones will be pre-pended to the shape
as needed to meet this requirement.
asarray(x) is like array(x, copy=False)
Use asarray(x) when you want to ensure that x will be an array before any other operations are done. If x is already an array then no copy would be done. It would not cause a redundant performance hit.
Here is an example of a function that ensure x is converted into an array first.
def mysum(x):
return np.asarray(x).sum()
Here's a simple example that can demonstrate the difference.
The main difference is that array will make a copy of the original data and using different object we can modify the data in the original array.
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
int_cvr = np.asarray(a, dtype = np.int64)
The contents in array (a), remain untouched, and still, we can perform any operation on the data using another object without modifying the content in original array.
Let's Understand the difference between np.array() and np.asarray() with the example:
np.array(): Convert input data (list, tuple, array, or other sequence type) to an ndarray and copies the input data by default.
np.asarray(): Convert input data to an ndarray but do not copy if the input is already an ndarray.
#Create an array...
arr = np.ones(5); # array([1., 1., 1., 1., 1.])
#Now I want to modify `arr` with `array` method. Let's see...
arr = np.array(arr)[3] = 200; # array([1., 1., 1., 1., 1.])
No change in the array because we are modify a copy of the arr.
Now, modify arr with asarray() method.
arr = np.asarray(arr)[3] = 200; # array([1., 200, 1., 1., 1.])
The change occur in this array because we are work with the original array now.
In numpy, we use ndarray.reshape() for reshaping an array.
I noticed that in pytorch, people use torch.view(...) for the same purpose, but at the same time, there is also a torch.reshape(...) existing.
So I am wondering what the differences are between them and when I should use either of them?
torch.view has existed for a long time. It will return a tensor with the new shape. The returned tensor will share the underling data with the original tensor.
See the documentation here.
On the other hand, it seems that torch.reshape has been introduced recently in version 0.4. According to the document, this method will
Returns a tensor with the same data and number of elements as input, but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.
It means that torch.reshape may return a copy or a view of the original tensor. You can not count on that to return a view or a copy. According to the developer:
if you need a copy use clone() if you need the same storage use view(). The semantics of reshape() are that it may or may not share the storage and you don't know beforehand.
Another difference is that reshape() can operate on both contiguous and non-contiguous tensor while view() can only operate on contiguous tensor. Also see here about the meaning of contiguous.
Although both torch.view and torch.reshape are used to reshape tensors, here are the differences between them.
As the name suggests, torch.view merely creates a view of the original tensor. The new tensor will always share its data with the original tensor. This means that if you change the original tensor, the reshaped tensor will change and vice versa.
>>> z = torch.zeros(3, 2)
>>> x = z.view(2, 3)
>>> z.fill_(1)
>>> x
tensor([[1., 1., 1.],
[1., 1., 1.]])
To ensure that the new tensor always shares its data with the original, torch.view imposes some contiguity constraints on the shapes of the two tensors [docs]. More often than not this is not a concern, but sometimes torch.view throws an error even if the shapes of the two tensors are compatible. Here's a famous counter-example.
>>> z = torch.zeros(3, 2)
>>> y = z.t()
>>> y.size()
torch.Size([2, 3])
>>> y.view(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: invalid argument 2: view size is not compatible with input tensor's
size and stride (at least one dimension spans across two contiguous subspaces).
Call .contiguous() before .view().
torch.reshape doesn't impose any contiguity constraints, but also doesn't guarantee data sharing. The new tensor may be a view of the original tensor, or it may be a new tensor altogether.
>>> z = torch.zeros(3, 2)
>>> y = z.reshape(6)
>>> x = z.t().reshape(6)
>>> z.fill_(1)
tensor([[1., 1.],
[1., 1.],
[1., 1.]])
>>> y
tensor([1., 1., 1., 1., 1., 1.])
>>> x
tensor([0., 0., 0., 0., 0., 0.])
TL;DR:
If you just want to reshape tensors, use torch.reshape. If you're also concerned about memory usage and want to ensure that the two tensors share the same data, use torch.view.
view() will try to change the shape of the tensor while keeping the underlying data allocation the same, thus data will be shared between the two tensors. reshape() will create a new underlying memory allocation if necessary.
Let's create a tensor:
a = torch.arange(8).reshape(2, 4)
The memory is allocated like below (it is C contiguous i.e. the rows are stored next to each other):
stride() gives the number of bytes required to go to the next element in each dimension:
a.stride()
(4, 1)
We want its shape to become (4, 2), we can use view:
a.view(4,2)
The underlying data allocation has not changed, the tensor is still C contiguous:
a.view(4, 2).stride()
(2, 1)
Let's try with a.t(). Transpose() doesn't modify the underlying memory allocation and therefore a.t() is not contiguous.
a.t().is_contiguous()
False
Although it is not contiguous, the stride information is sufficient to iterate over the tensor
a.t().stride()
(1, 4)
view() doesn't work anymore:
a.t().view(2, 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Below is the shape we wanted to obtain by using view(2, 4):
What would the memory allocation look like?
The stride would be something like (4, 2) but we would have to go back to the begining of the tensor after we reach the end. It doesn't work.
In this case, reshape() would create a new tensor with a different memory allocation to make the transpose contiguous:
Note that we can use view to split the first dimension of the transpose.
Unlike what is said in the accepted and other answers, view() can operate on non-contiguous tensors!
a.t().view(2, 2, 2)
a.t().view(2, 2, 2).stride()
(2, 1, 4)
According to the documentation:
For a tensor to be viewed, the new view size must be compatible with
its original size and stride, i.e., each new view dimension must
either be a subspace of an original dimension, or only span across
original dimensions d, d+1, …, d+k that satisfy the following
contiguity-like condition that ∀i=d,…,d+k−1,
stride[i]=stride[i+1]×size[i+1]
Here that's because the first two dimensions after applying view(2, 2, 2) are subspaces of the transpose's first dimension.
For more information about contiguity have a look at my answer in this thread
Tensor.reshape() is more robust. It will work on any tensor, while Tensor.view() works only on tensor t where t.is_contiguous()==True.
To explain about non-contiguous and contiguous is another story, but you can always make the tensor t contiguous if you call t.contiguous() and then you can call view() without the error.
I would say the answers here are technically correct but there's another reason for existing of reshape. pytorch is usually considered more convenient than other frameworks because it closer to python and numpy. It's interesting that the question involves numpy.
Let's look into size and shape in pytorch. size is a function so you call it like x.size(). shape in pytorch is not a function. In numpy you have shape and it's not a function - you use it x.shape. So it's handy to get both of them in pytorch. If you came from numpy it would be nice to use the same functions.
Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?
Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.
There are 4 attributes that make up the csr_matrix:
data: An array containing the actual values in the matrix
indices: An array containing the column index corresponding to each value in data
indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.
shape: A tuple containing the shape of the matrix
If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.
x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 0.]])
Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.
def csr_vappend(a,b):
""" Takes in 2 csr_matrices and appends the second one to the bottom of the first one.
Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
the first matrix instead of copying it. The data, indices, and indptr still get copied."""
a.data = np.hstack((a.data,b.data))
a.indices = np.hstack((a.indices,b.indices))
a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
a._shape = (a.shape[0]+b.shape[0],b.shape[1])
return a
Not sure if you're still looking for a solution, but maybe others can look into hstack and vstack - http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then vstack it with the previous matrix.
I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.
Update with more information:
LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for data and rows are both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.