Related
I have a tensor
import torch
torch.zeros((5,10))
>>> tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
How can I replace the values of X random cells in each row with random inputs (torch.rand())?
That is, if X = 2, in each row, 2 random cells should be replaced with torch.rand().
Since I need it to not break backpropagation I found here that replacing the .data attribute of the cells should work.
The only familiar thing to me is using a for loop but it's not efficient for a large tensor
You can try tensor.scatter_().
x = torch.zeros(3,4)
n_replace = 3 # number of cells to be replaced with random number
src = torch.randn(x.size())
index = torch.stack([torch.randperm(x.size()[1]) for _ in range(x.size()[0])])[:,:n_replace]
x.scatter_(1, index, src)
Out[22]:
tensor([[ 0.0000, 0.5769, 0.7432, -0.1776],
[-2.1673, -1.0802, 0.0000, 0.6241],
[-0.6421, 0.1315, 0.0000, -2.7224]])
To avoid repetition,
perm = torch.randperm(tensor.size(0))
idx = perm[:k]
samples = tensor[idx]
I'm a typical user of R, but in python I'm stuck.
I have a lot of images saved as NumPy array I need to resize the pad of array/images to 4k resolution from different widths which oscillated between 1620 to 2800, the height is constant: 2160.
I need to resize the pad of each array/image to 3840*2160, ie. add a black border on right and left side, so that the array/image itself remains unchanged.
For resizing I try this, but the code adds black edges to all sides.
arr = np.array([[1,1,1],[1,1,1],[1,1,1],[1,1,1]])
FinalWidth = 20
def pad_with(vector, pad_width, iaxis, kwargs):
pad_value = kwargs.get('padder', 0)
vector[:pad_width[0]] = pad_value
arr2 = np.pad(arr,FinalWidth/2,pad_with)
I think you just need hstack, assuming you want half the width to go on either side:
def pad_with(vector, pad_width):
temp = np.hstack((np.zeros((vector.shape[0], pad_width//2)), vector))
return np.hstack((temp, np.zeros((vector.shape[0], pad_width//2))))
arr2 = pad_with(arr,FinalWidth)
arr2
>>> array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.]])
arr2.shape
>>> (4, 23)
Given a list of coordinates that represent a rectangle in a grid (e.g. the upper-left and lower-right coordinate), which would be the most efficient way to fill a binary NumPy array with ones in the place of that rectangles?
The simple way would be to do a for loop such as
arr = np.zeros((w, h))
for x1, y1, x2, y2 in coordinates:
arr[x1:x2, y1:y2] = True
where coordinates is something like [(x_11, y_11, x_22, y_22), ..., (x_n1, y_n1, x_n2, y_n2)]
However, I want to try to avoid it, as it is one of the advantages of using vectorial inner NumPy operations. I have tried the logical_and but it seems that it works for a single rectangle or condition. How could I do it in a more "numpy" way?
The resulting image would be something like this for 2 rectangles:
Let say (1,1) are the upper-left coordinates of the rectangle,
and (5,4) the lower-right.
Then
arr = np.zeros((10, 10))
arr[1:5, 1:4] = 1
returns
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
How can I remove the NaN rows from the array below using indices (since I will need to remove the same rows from a different array.
array([[[nan, 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., nan, 0., 0.],
[ 0., 0., 0., 0.]]])
I get the indices of the rows to be removed by using the command
a[np.isnan(a).any(axis=2)]
But using what I would normally use on a 2D array does not produce the desired result, losing the array structure.
a[~np.isnan(a).any(axis=2)]
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
How can I remove the rows I want using the indices obtained from my first command?
You need to reshape:
a[~np.isnan(a).any(axis=2)].reshape(a.shape[0], -1, a.shape[2])
But be aware that the number of NaN-rows at each 2D subarray should be the same to get a new 3D array.
Studying Deep Learning with Python, I can't comprehend the following simple batch of code which encodes the integer sequences into a binary matrix.
def vectorize_sequences(sequences, dimension=10000):
# Create an all-zero matrix of shape (len(sequences), dimension)
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1. # set specific indices of results[i] to 1s
return results
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
x_train = vectorize_sequences(train_data)
And the output of x_train is something like
x_train[0]
array([ 0., 1.,1., ...,0.,0.,0.])
Can someone put some light of the 0.'s existance in x_train array while only 1.'s are appending in each next i iteration?
I mean shouldn't be all 1's?
The script transforms you dataset into a binary vector space model. Let's disect things one by one.
First, if we examine the x_train content we see that each review is represented as a sequence of word ids. Each word id corresponds to one specific word:
print(train_data[0]) # print the first review
[1, 14, 22, 16, 43, 530, 973, ..., 5345, 19, 178, 32]
Now, this would be very difficult to feed the network. The lengths of reviews varies, fractional values between any integers have no meaning (e.g. what if on the output we get 43.5, what does it mean?)
So what we can do, is create a single looong vector, the size of the entire dictionary, dictionary=10000 in your example. We will then associate each element/index of this vector with one word/word_id. So word represented by word id 14 will now be represented by 14-th element of this vector.
Each element will either be 0 (word is not present in the review) or 1 (word is present in the review). And we can treat this as a probability, so we even have meaning for values in between 0 and 1. Furthermore, every review will now be represented by this very long (sparse) vector which has a constant length for every review.
So on a smaller scale if:
word word_id
I -> 0
you -> 1
he -> 2
be -> 3
eat -> 4
happy -> 5
sad -> 6
banana -> 7
a -> 8
the sentences would then be processed in a following way.
I be happy -> [0,3,5] -> [1,0,0,1,0,1,0,0,0]
I eat a banana. -> [0,4,8,7] -> [1,0,0,0,1,0,0,1,1]
Now I highlighted the word sparse. That means, there will have A LOT MORE zeros in comparison with ones. We can take advantage of that. Instead of checking every word, whether it is contained in a review or not; we will check a substantially smaller list of only those words that DO appear in our review.
Therefore, we can make things easy for us and create reviews × vocabulary matrix of zeros right away by np.zeros((len(sequences), dimension)). And then just go through words in each review and flip the indicator to 1.0 at position corresponding to that word:
result[review_id][word_id] = 1.0
So instead of doing 25000 x 10000 = 250 000 000 operations, we only did number of words = 5 967 841. That's just ~2.5% of original amount of operations.
The for loop here is not processing all the matrix. As you can see, it enumerates elements of the sequence, so it's looping only on one dimension.
Let's take a simple example :
t = np.array([1,2,3,4,5,6,7,8,9])
r = np.zeros((len(t), 10))
Output
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
then we modify elements with the same way you have :
for i, s in enumerate(t):
r[i,s] = 1.
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
you can see that the for loop modified only a set of elements (len(t)) which has index [i,s] (in this case ; (0, 1), (1, 2), (2, 3), an so on))
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results