Pytorch softmax: What dimension to use? - python

The function torch.nn.functional.softmax takes two parameters: input and dim. According to its documentation, the softmax operation is applied to all slices of input along the specified dim, and will rescale them so that the elements lie in the range (0, 1) and sum to 1.
Let input be:
input = torch.randn((3, 4, 5, 6))
Suppose I want the following, so that every entry in that array is 1:
sum = torch.sum(input, dim = 3) # sum's size is (3, 4, 5, 1)
How should I apply softmax?
softmax(input, dim = 0) # Way Number 0
softmax(input, dim = 1) # Way Number 1
softmax(input, dim = 2) # Way Number 2
softmax(input, dim = 3) # Way Number 3
My intuition tells me that is the last one, but I am not sure. English is not my first language and the use of the word along seemed confusing to me because of that.
I am not very clear on what "along" means, so I will use an example that could clarify things. Suppose we have a tensor of size (s1, s2, s3, s4), and I want this to happen

Steven's answer is not correct. See the snapshot below. It is actually the reverse way.
Image transcribed as code:
>>> x = torch.tensor([[1,2],[3,4]],dtype=torch.float)
>>> F.softmax(x,dim=0)
tensor([[0.1192, 0.1192],
[0.8808, 0.8808]])
>>> F.softmax(x,dim=1)
tensor([[0.2689, 0.7311],
[0.2689, 0.7311]])

The easiest way I can think of to make you understand is: say you are given a tensor of shape (s1, s2, s3, s4) and as you mentioned you want to have the sum of all the entries along the last axis to be 1.
sum = torch.sum(input, dim = 3) # input is of shape (s1, s2, s3, s4)
Then you should call the softmax as:
softmax(input, dim = 3)
To understand easily, you can consider a 4d tensor of shape (s1, s2, s3, s4) as a 2d tensor or matrix of shape (s1*s2*s3, s4). Now if you want the matrix to contain values in each row (axis=0) or column (axis=1) that sum to 1, then, you can simply call the softmax function on the 2d tensor as follows:
softmax(input, dim = 0) # normalizes values along axis 0
softmax(input, dim = 1) # normalizes values along axis 1
You can see the example that Steven mentioned in his answer.

Let's consider the example in two dimensions
x = [[1,2],
[3,4]]
do you want your final result to be
y = [[0.27,0.73],
[0.27,0.73]]
or
y = [[0.12,0.12],
[0.88,0.88]]
If it's the first option then you want dim = 1. If it's the second option you want dim = 0.
Notice that the columns or zeroth dimension is normalized in the second example hence it is normalized along the zeroth dimension.
Updated 2018-07-10: to reflect that zeroth dimension refers to columns in pytorch.

I am not 100% sure what your question means but I think your confusion is simply that you don't understand what dim parameter means. So I will explain it and provide examples.
If we have:
m0 = nn.Softmax(dim=0)
what that means is that m0 will normalize elements along the zeroth coordinate of the tensor it receives. Formally if given a tensor b of size say (d0,d1) then the following will be true:
sum^{d0}_{i0=1} b[i0,i1] = 1, forall i1 \in {0,...,d1}
you can easily check this with a Pytorch example:
>>> b = torch.arange(0,4,1.0).view(-1,2)
>>> b
tensor([[0., 1.],
[2., 3.]])
>>> m0 = nn.Softmax(dim=0)
>>> b0 = m0(b)
>>> b0
tensor([[0.1192, 0.1192],
[0.8808, 0.8808]])
now since dim=0 means going through i0 \in {0,1} (i.e. going through the rows) if we choose any column i1 and sum its elements (i.e. the rows) then we should get 1. Check it:
>>> b0[:,0].sum()
tensor(1.0000)
>>> b0[:,1].sum()
tensor(1.0000)
as expected.
Note we do get all rows sum to 1 by "summing out the rows" with torch.sum(b0,dim=0), check it out:
>>> torch.sum(b0,0)
tensor([1.0000, 1.0000])
We can create a more complicated example to make sure it's really clear.
a = torch.arange(0,24,1.0).view(-1,3,4)
>>> a
tensor([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
>>> a0 = m0(a)
>>> a0[:,0,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,2,0].sum()
tensor(1.0000)
>>> a0[:,1,0].sum()
tensor(1.0000)
>>> a0[:,1,1].sum()
tensor(1.0000)
>>> a0[:,2,3].sum()
tensor(1.0000)
so as we expected if we sum all the elements along the first coordinate from the first value to the last value we get 1. So everything is normalized along the first dimension (or first coordiante i0).
>>> torch.sum(a0,0)
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000]])
Also along the dimension 0 means that you vary the coordinate along that dimension and consider each element. Sort of like having a for loop going through the values the first coordinates can take i.e.
for i0 in range(0,d0):
a[i0,b,c,d]

import torch
import torch.nn.functional as F
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float)
s1 = F.softmax(x, dim=0)
tensor([[0.1192, 0.1192],
[0.8808, 0.8808]])
s2 = F.softmax(x, dim=1)
tensor([[0.2689, 0.7311],
[0.2689, 0.7311]])
torch.sum(s1, dim=0)
tensor([1., 1.])
torch.sum(s2, dim=1)
tensor([1., 1.])

Think of what softmax is trying to achieve. It outputs probability of one outcome against the other. Let's say you are trying to predict two outcomes: is it A or is it B. If p(A) is greater than p(B) then the next step is to convert the outcome into Boolean( i.e. the outcome would be A if p(A) > 50% or B if p(B) > 50% Since we are dealing with probabilities they should add-up to 1.
Therefore what you want is sum probabilities OF EACH ROW to be 1. Therefore you specify dim=1 or row sum
On the other hand if your model is designed to predict more than two variables the output tensor will look something like [p(a), p(b), p(c)...p(i)]
What matters here is that p(a) + p(b) + p(c) +...p(i) = 1
then you would use dim = 0
It all depends on how you define your output layer.

Related

Add a level to Numpy array

I have a problem with a numpy array.
In particular, suppose to have a matrix
x = np.array([[1., 2., 3.], [4., 5., 6.]])
with shape (2,3), I want to convert the float numbers into list so to obtain the array [[[1.], [2.], [3.]], [[4.], [5.], [6.]]] with shape (2,3,1).
I tried to convert each float number to a list (i.e., x[0][0] = [x[0][0]]) but it does not work.
Can anyone help me? Thanks
What you want is adding another dimension to your numpy array. One way of doing it is using reshape:
x = x.reshape(2,3,1)
output:
[[[1.]
[2.]
[3.]]
[[4.]
[5.]
[6.]]]
There is a function in Numpy to perform exactly what #Valdi_Bo mentions. You can use np.expand_dims and add a new dimension along axis 2, as follows:
x = np.expand_dims(x, axis=2)
Refer:
np.expand_dims
Actually, you want to add a dimension (not level).
To do it, run:
result = x[...,np.newaxis]
Its shape is just (2, 3, 1).
Or save the result back under x.
You are trying to add a new dimension to the numpy array. There are multiple ways of doing this as other answers mentioned np.expand_dims, np.new_axis, np.reshape etc. But I usually use the following as I find it the most readable, especially when you are working with vectorizing multiple tensors and complex operations involving broadcasting (check this Bounty question that I solved with this method).
x[:,:,None].shape
(2,3,1)
x[None,:,None,:,None].shape
(1,2,1,3,1)
Well, maybe this is an overkill for the array you have, but definitely the most efficient solution is to use np.lib.stride_tricks.as_strided. This way no data is copied.
import numpy as np
x = np.array([[1., 2., 3.], [4., 5., 6.]])
newshape = x.shape[:-1] + (x.shape[-1], 1)
newstrides = x.strides + x.strides[-1:]
a = np.lib.stride_tricks.as_strided(x, shape=newshape, strides=newstrides)
results in:
array([[[1.],
[2.],
[3.]],
[[4.],
[5.],
[6.]]])
>>> a.shape
(2, 3, 1)

How to calculate shape of a tensor in tensorflow

In order to understand a tensor in tensorflow clearly, I need to have a clear understanding of how is the shape a tensor defined.
These are some examples from the tensorflow document:
3 # a rank 0 tensor; this is a scalar with shape []
[1. ,2., 3.] # a rank 1 tensor; this is a vector with shape [3]
[[1., 2., 3.], [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
[[[1., 2., 3.]], [[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]
Is the below understanding of mine correct:
In order to find the shape of the tensor, we start from the outermost list and count the number of elements (or lists) inside. This count makes the first dimension. We then repeat this procedure for the inner lists and find the next dimensions of the tensor.
Please correct me if I am wrong.
Yes, your understanding is correct. If you have a valid tensor, your algorithm will return you the correct dimensions of the tensor. You can write it in python in the following way
def get_shape(arr):
res = []
while isinstance((arr), list):
res.append(len(arr))
arr = arr[0]
return res
Notice that in case of the arbitrary value of arr, you also need to make sure that dimensions match ([[1, 2, 3], [4, 5]] is not a valid tensor)

Python - euclidean distance of all pairs of subsequences of given length from given array

Lets say I have an numpy array [5,7,2,3,4,6] and I choose length of subsequence to be 3.
I want to get euclidean distances of such subsequences.
Possible subsequences are:
[5,7,2]
[7,2,3]
[2,3,4]
[3,4,6]
Distance between subsequence 1. and 3. would be calculated as (5-2)^2 + (7-3)^2 + (2-4)^2. I want to do this for all pairs of subsequences.
Is there a way to avoid loops?
My real array is quite long so the solution should be memory efficient as well.
EDIT>
To elaborate more: I have a timeseries of size 10^5 to 10^8 elements
Time series is growing. each time new point is added I need to take the L newest points and find a closest match to these points in the past points of the dataset. (But I want all value of distances not only to find the closest match)
Repeating the whole calculation is unnecessary. The distance of "previously newest L points" can be updated and only modified by substracting point of age L+1 and adding point of age 0 (the newest).
E.g. lets say size of time series is currently 100 and L=10. I calculate distances of subsequence A[90:100] to all previous subsequences. When 101st point arrives I can reuse the distances and only update them by adding a squares of distances of 101st point from the time series and substracting squares of 90th point.
EDIT 2>
Thanks a lot for the ideas, looks like magic. I have one more idea that might be efficient especially for the online time series when new elements of tiem series are being added.
I am thinking about this way of updating the distances. To calculate distances of first subsequence of length L=4 to the matrix we need to have first 4 columns of the following matrix (the triangles on top and bottom could be ommited). Then the distances would be squared and summed as shown with colors.
To obtain the distances of second subsequence of L=4 we can actually reuse the previously calculated distances and substract first column (squared) from them and add 4th column(squared). For L=4 it might not make sense but for L=100 it might. One distance has to be calculated from scratch. (Actually 2 have to be calculated if the Time series grows in size).
This way I can keep in memory just the distances of one subsequence and update them to obtain distances of next subsequence.
Do you think this would be efficient with numpy? Is there an easy way to implement it?
Assuming A as the input array and L as the length of subsequence, you can get a sliding 2D array version of A with broadcasting and then use pdist from scipy.spatial.distance, like so -
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
Please note that if you meant euclidean distances, you need to replace 'sqeuclidean' with 'euclidean' or just leave out that argument as it's the default one.
Sample run -
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
To get the correspinding IDs, you could use np.triu_indices like so -
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
And, finally show IDs alongside the distances like so -
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
Sample run -
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
For cases, when you are dealing millions of elements in A and L is in hundreds, it might be a better idea to perform computations for each pairwise differentiations of such sub-sequences in a loop, like so -
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
You can also use np.einsum to get us the squared summations at each iteration, like so -
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)

pybrain - ClassificationDataSet - how to understand the output when using SoftmaxLayer

I am trying to build first classifier with Pybrain neural network along with specialized ClassificationDataSet and I am not sure I fully understand it works.
So I have a pandas dataframe with 6 feature columns and 1 column for class label (Survived, just 0 or 1).
I build a dataset out of it:
ds = ClassificationDataSet(6, 1, nb_classes=2)
for i in df[['Gender', 'Pclass', 'AgeFill', 'FamilySize', 'FarePerPerson', 'Deck','Survived']].values:
ds.addSample(tuple(i[:-1]), i[-1])
ds._convertToOneOfMany()
return ds
Ok, I check how dataset looks like:
for i, m in ds:
i, m
(array([ 1., 3., 2., 2., 1., 8.]), array([1, 0]))
(array([ 0., 1., 1., 2., 0., 2.]), array([0, 1]))
And I already have a problem. What means [1,0] or [0,1]? Is it just '0' or '1' of original 'survived' column? How to get back to original values?
Later, when I finish with training of my network:
net = buildNetwork(6, 6, 2, hiddenclass=TanhLayer, bias=True, outclass=SoftmaxLayer)
trainer = BackpropTrainer(net, ds)
trainer.trainEpochs(10)
I will try to activate it on my another dataset (for which I want to do actual classification) and I will get a pairs of activation results for each of 2 output neurons, but how to understand which output neuron corresponds to which original class? Probably this is something obvious, but I am not able to understand it from the docs, unfortunately.
Ok, looks like pybrain uses position to determine which class it means by (0,1) or (1,0).
To go back to original 0 or 1 mark you need to use argmax() function. So for example if I already have a trained network and I want to validate it on the same data as I used for training I could do this:
for inProp, num in ds:
out = net.activate(inProp).argmax()
if out == num.argmax():
true+=1
total+=1
res = true/total
inProp will look like a tuple of my input values for activation, num - a tuple of expected two-neuron output (either (0,1) or (1,0)) and num.argmax() will translate it into just 0 or 1 - real output.
I might be wrong since this is a pure heuristic, but it works in my example.

Assigning values to two dimensional array from two one dimensional ones

Most probably somebody else already asked this but I couldn't find it. The question is how can I assign values to a 2D array from two 1D arrays. For example:
import numpy as np
#a is the 2D array. b is the 1D array and should be assigned
#to second coordinate. In this exaple the first coordinate is 1.
a=np.zeros((3,2))
b=np.asarray([1,2,3])
c=np.ones(3)
a=np.vstack((c,b)).T
output:
[[ 1. 1.]
[ 1. 2.]
[ 1. 3.]]
I know the way I am doing it so naive, but I am sure there should be a one line way of doing this.
P.S. In real case that I am dealing with, this is a subarray of an array, and therefore I cannot set the first coordinate from the beginning to one. The whole array's first coordinate are different, but after applying np.where they become constant.
How about 2 lines?
>>> c = np.ones((3, 2))
>>> c[:, 1] = [1, 2, 3]
And the proof it works:
>>> c
array([[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
Or, perhaps you want np.column_stack:
>>> np.column_stack(([1.,1,1],[1,2,3]))
array([[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
First, there's absolutely no reason to create the original zeros array that you stick in a, never reference, and replace with a completely different array with the same name.
Second, if you want to create an array the same shape and dtype as b but with all ones, use ones_like.
So:
b = np.array([1,2,3])
c = np.ones_like(b)
d = np.vstack((c, b).T
You could of course expand b to a 3x1-array instead of a 3-array, in which case you can use hstack instead of needing to vstack then transpose… but I don't think that's any simpler:
b = np.array([1,2,3])
b = np.expand_dims(b, 1)
c = np.ones_like(b)
d = np.hstack((c, b))
If you insist on 1 line, use fancy indexing:
>>> a[:,0],a[:,1]=[1,1,1],[1,2,3]

Categories