Why is pytorch softmax function not working? - python

so this is my code
import torch.nn.functional as F
import torch
inputs = [1,2,3]
input = torch.tensor(inputs)
output = F.softmax(input, dim=1)
print(output)
is the reason why the code not working because of the dim?
the error here:
File "c:\Users\user\Desktop\AI\pytorch_jovian\linear_reg.py", line 19, in <module>
output = F.softmax(input, dim=1)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py", line 1583, in softmax
ret = input.softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Apart from dim=0, there is another issue in your code. Softmax doesn't work on a long tensor, so it should be converted to a float or double tensor first
>>> input = torch.tensor([1, 2, 3])
>>> input
tensor([1, 2, 3])
>>> F.softmax(input.float(), dim=0)
tensor([0.0900, 0.2447, 0.6652])

Related

weird beheaviour when adding matrices [duplicate]

I recently started to follow along with Siraj Raval's Deep Learning tutorials on YouTube, but I an error came up when I tried to run my code. The code is from the second episode of his series, How To Make A Neural Network. When I ran the code I got the error:
Traceback (most recent call last):
File "C:\Users\dpopp\Documents\Machine Learning\first_neural_net.py", line 66, in <module>
neural_network.train(training_set_inputs, training_set_outputs, 10000)
File "C:\Users\dpopp\Documents\Machine Learning\first_neural_net.py", line 44, in train
self.synaptic_weights += adjustment
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
I checked multiple times with his code and couldn't find any differences, and even tried copying and pasting his code from the GitHub link. This is the code I have now:
from numpy import exp, array, random, dot
class NeuralNetwork():
def __init__(self):
# Seed the random number generator, so it generates the same numbers
# every time the program runs.
random.seed(1)
# We model a single neuron, with 3 input connections and 1 output connection.
# We assign random weights to a 3 x 1 matrix, with values in the range -1 to 1
# and mean 0.
self.synaptic_weights = 2 * random.random((3, 1)) - 1
# The Sigmoid function, which describes an S shaped curve.
# We pass the weighted sum of the inputs through this function to
# normalise them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + exp(-x))
# The derivative of the Sigmoid function.
# This is the gradient of the Sigmoid curve.
# It indicates how confident we are about the existing weight.
def __sigmoid_derivative(self, x):
return x * (1 - x)
# We train the neural network through a process of trial and error.
# Adjusting the synaptic weights each time.
def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
for iteration in range(number_of_training_iterations):
# Pass the training set through our neural network (a single neuron).
output = self.think(training_set_inputs)
# Calculate the error (The difference between the desired output
# and the predicted output).
error = training_set_outputs - output
# Multiply the error by the input and again by the gradient of the Sigmoid curve.
# This means less confident weights are adjusted more.
# This means inputs, which are zero, do not cause changes to the weights.
adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))
# Adjust the weights.
self.synaptic_weights += adjustment
# The neural network thinks.
def think(self, inputs):
# Pass inputs through our neural network (our single neuron).
return self.__sigmoid(dot(inputs, self.synaptic_weights))
if __name__ == '__main__':
# Initialize a single neuron neural network
neural_network = NeuralNetwork()
print("Random starting synaptic weights:")
print(neural_network.synaptic_weights)
# The training set. We have 4 examples, each consisting of 3 input values
# and 1 output value.
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]])
# Train the neural network using a training set
# Do it 10,000 times and make small adjustments each time
neural_network.train(training_set_inputs, training_set_outputs, 10000)
print("New Synaptic weights after training:")
print(neural_network.synaptic_weights)
# Test the neural net with a new situation
print("Considering new situation [1, 0, 0] -> ?:")
print(neural_network.think(array([[1, 0, 0]])))
Even after copying and pasting the same code that worked in Siraj's episode, I'm still getting the same error.
I just started out look into artificial intelligence, and don't understand what the error means. Could someone please explain what it means and how to fix it? Thanks!
Change self.synaptic_weights += adjustment to
self.synaptic_weights = self.synaptic_weights + adjustment
self.synaptic_weights must have a shape of (3,1) and adjustment must have a shape of (3,4). While the shapes are broadcastable numpy must not like trying to assign the result with shape (3,4) to an array of shape (3,1)
a = np.ones((3,1))
b = np.random.randint(1,10, (3,4))
>>> a
array([[1],
[1],
[1]])
>>> b
array([[8, 2, 5, 7],
[2, 5, 4, 8],
[7, 7, 6, 6]])
>>> a + b
array([[9, 3, 6, 8],
[3, 6, 5, 9],
[8, 8, 7, 7]])
>>> b += a
>>> b
array([[9, 3, 6, 8],
[3, 6, 5, 9],
[8, 8, 7, 7]])
>>> a
array([[1],
[1],
[1]])
>>> a += b
Traceback (most recent call last):
File "<pyshell#24>", line 1, in <module>
a += b
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
The same error occurs when using numpy.add and specifying a as the output array
>>> np.add(a,b, out = a)
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
np.add(a,b, out = a)
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
>>>
A new a needs to be created
>>> a = a + b
>>> a
array([[10, 4, 7, 9],
[ 4, 7, 6, 10],
[ 9, 9, 8, 8]])
>>>
Hopefully, by now you must have executed the code, but the problem between his code and your code is this line:
training_output = np.array([[0,1,1,0]]).T
While transposing don't forget to add 2 square brackets, I had the same problem for the same code, this worked for me.
Thanks

Concatenate identity matrix to each vector

I want to modify my input by adding several different suffixes to the input vectors. For example, if the (single) input is [1, 5, 9, 3] I want to create three vectors (stored as matrix) like this:
[[1, 5, 9, 3, 1, 0, 0],
[1, 5, 9, 3, 0, 1, 0],
[1, 5, 9, 3, 0, 0, 1]]
Of course, this is just one observation so the input to the model is (None, 4) in this case. The simple way is to prepare the input data somewhere else (numpy most probably) and adjust the shape of input accordingly. That I can do but I would prefer doing it inside TensorFlow/Keras.
I have isolated the problem into this code:
import keras.backend as K
from keras import Input, Model
from keras.layers import Lambda
def build_model(dim_input: int, dim_eye: int):
input = Input((dim_input,))
concat = Lambda(lambda x: concat_eye(x, dim_input, dim_eye))(input)
return Model(inputs=[input], outputs=[concat])
def concat_eye(x, dim_input, dim_eye):
x = K.reshape(x, (-1, 1, dim_input))
x = K.repeat_elements(x, dim_eye, axis=1)
eye = K.expand_dims(K.eye(dim_eye), axis=0)
eye = K.tile(eye, (-1, 1, 1))
out = K.concatenate([x, eye], axis=2)
return out
def main():
import numpy as np
n = 100
dim_input = 20
dim_eye = 3
model = build_model(dim_input, dim_eye)
model.compile(optimizer='sgd', loss='mean_squared_error')
x_train = np.zeros((n, dim_input))
y_train = np.zeros((n, dim_eye, dim_eye + dim_input))
model.fit(x_train, y_train)
if __name__ == '__main__':
main()
The problem seems to be in the -1 in shape argument in tile function. I tried to replace it with 1 and None. Each has its own error:
-1: error during model.fit
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected multiples[0] >= 0, but got -1
1: error duting model.fit
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [32,3,20] vs. shape[1] = [1,3,3]
None: error during build_model:
Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1). Consider casting elements to a supported type.
You need to use K.shape() instead to get the symbolic shape of input tensor. That's because the batch size is None and therefore passing K.int_shape(x)[0] or None or -1 as a part of the second argument of K.tile() would not work:
eye = K.tile(eye, (K.shape(x)[0], 1, 1))

ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)

I recently started to follow along with Siraj Raval's Deep Learning tutorials on YouTube, but I an error came up when I tried to run my code. The code is from the second episode of his series, How To Make A Neural Network. When I ran the code I got the error:
Traceback (most recent call last):
File "C:\Users\dpopp\Documents\Machine Learning\first_neural_net.py", line 66, in <module>
neural_network.train(training_set_inputs, training_set_outputs, 10000)
File "C:\Users\dpopp\Documents\Machine Learning\first_neural_net.py", line 44, in train
self.synaptic_weights += adjustment
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
I checked multiple times with his code and couldn't find any differences, and even tried copying and pasting his code from the GitHub link. This is the code I have now:
from numpy import exp, array, random, dot
class NeuralNetwork():
def __init__(self):
# Seed the random number generator, so it generates the same numbers
# every time the program runs.
random.seed(1)
# We model a single neuron, with 3 input connections and 1 output connection.
# We assign random weights to a 3 x 1 matrix, with values in the range -1 to 1
# and mean 0.
self.synaptic_weights = 2 * random.random((3, 1)) - 1
# The Sigmoid function, which describes an S shaped curve.
# We pass the weighted sum of the inputs through this function to
# normalise them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + exp(-x))
# The derivative of the Sigmoid function.
# This is the gradient of the Sigmoid curve.
# It indicates how confident we are about the existing weight.
def __sigmoid_derivative(self, x):
return x * (1 - x)
# We train the neural network through a process of trial and error.
# Adjusting the synaptic weights each time.
def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
for iteration in range(number_of_training_iterations):
# Pass the training set through our neural network (a single neuron).
output = self.think(training_set_inputs)
# Calculate the error (The difference between the desired output
# and the predicted output).
error = training_set_outputs - output
# Multiply the error by the input and again by the gradient of the Sigmoid curve.
# This means less confident weights are adjusted more.
# This means inputs, which are zero, do not cause changes to the weights.
adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))
# Adjust the weights.
self.synaptic_weights += adjustment
# The neural network thinks.
def think(self, inputs):
# Pass inputs through our neural network (our single neuron).
return self.__sigmoid(dot(inputs, self.synaptic_weights))
if __name__ == '__main__':
# Initialize a single neuron neural network
neural_network = NeuralNetwork()
print("Random starting synaptic weights:")
print(neural_network.synaptic_weights)
# The training set. We have 4 examples, each consisting of 3 input values
# and 1 output value.
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]])
# Train the neural network using a training set
# Do it 10,000 times and make small adjustments each time
neural_network.train(training_set_inputs, training_set_outputs, 10000)
print("New Synaptic weights after training:")
print(neural_network.synaptic_weights)
# Test the neural net with a new situation
print("Considering new situation [1, 0, 0] -> ?:")
print(neural_network.think(array([[1, 0, 0]])))
Even after copying and pasting the same code that worked in Siraj's episode, I'm still getting the same error.
I just started out look into artificial intelligence, and don't understand what the error means. Could someone please explain what it means and how to fix it? Thanks!
Change self.synaptic_weights += adjustment to
self.synaptic_weights = self.synaptic_weights + adjustment
self.synaptic_weights must have a shape of (3,1) and adjustment must have a shape of (3,4). While the shapes are broadcastable numpy must not like trying to assign the result with shape (3,4) to an array of shape (3,1)
a = np.ones((3,1))
b = np.random.randint(1,10, (3,4))
>>> a
array([[1],
[1],
[1]])
>>> b
array([[8, 2, 5, 7],
[2, 5, 4, 8],
[7, 7, 6, 6]])
>>> a + b
array([[9, 3, 6, 8],
[3, 6, 5, 9],
[8, 8, 7, 7]])
>>> b += a
>>> b
array([[9, 3, 6, 8],
[3, 6, 5, 9],
[8, 8, 7, 7]])
>>> a
array([[1],
[1],
[1]])
>>> a += b
Traceback (most recent call last):
File "<pyshell#24>", line 1, in <module>
a += b
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
The same error occurs when using numpy.add and specifying a as the output array
>>> np.add(a,b, out = a)
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
np.add(a,b, out = a)
ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,4)
>>>
A new a needs to be created
>>> a = a + b
>>> a
array([[10, 4, 7, 9],
[ 4, 7, 6, 10],
[ 9, 9, 8, 8]])
>>>
Hopefully, by now you must have executed the code, but the problem between his code and your code is this line:
training_output = np.array([[0,1,1,0]]).T
While transposing don't forget to add 2 square brackets, I had the same problem for the same code, this worked for me.
Thanks

How to multiply weighted value to matrix in keras?

I have two vectors, weighted: shape (None, 3) and D: shape (None, 3, 5). Then I want to multiply weighted to D like weighted * D: shape(None, 3, 5).
I attached my image below. So each scalar value is multiplied to each row element.
So I tried multiply([weighted, D]), but I got an error ValueError: Operands could not be broadcast together with shapes (3, 5) (3,). I assume this is caused of different shape of inputs. Then, how do I fix this?
Update
multiply([weighted, Permute((2, 1))(D)]) worked. I am not sure but last element of shape must be same..
You can reshape weighted and use broadcasting to accomplish that. Like this:
weighted = weighted.reshape(-1, 3, 1)
result = weighted * D
Update 1: The same concept (broadcasting) can be used for instance in tensorflow with tf.expand_dims(weights, dim=2). My POC:
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
anp = np.array([[1, 2, 10], [2, 1, 10]])
bnp = np.random.random((2, 3, 5))
with tf.Session() as sess:
weighted = tf.placeholder(tf.float32, shape=(None, 3))
D = tf.placeholder(tf.float32, shape=(None, 3, 5))
rweighted = tf.expand_dims(weighted, dim=2)
result = rweighted * D
r = sess.run(result, feed_dict={weighted: anp, D: bnp})
print(bnp)
print("--")
print(r)
For keras use the backend API:
from keras import backend as K
...
K.expand_dims(weighted, 2)

tf.gradients return all zero vector when called on a tf.gradients result

I have the following code snippet:
interpolates = alpha*real_data + ((1-alpha)*fake_data)
disc_interpolates = Discriminator(interpolates)
gradients = tf.gradients(disc_interpolates, [interpolates])[0]
second_grad = tf.gradients(gradients[0], [interpolates])[0]
def Discriminator(inputs):
output = tf.reshape(inputs, [-1, 3, 32, 32])
output = lib.ops.conv2d.Conv2D('Discriminator.1', 3, DIM, 5, output, stride=2)
output = LeakyReLU(output)
output = lib.ops.conv2d.Conv2D('Discriminator.2', DIM, 2*DIM, 5, output, stride=2)
output = LeakyReLU(output)
output = lib.ops.conv2d.Conv2D('Discriminator.3', 2*DIM, 4*DIM, 5, output, stride=2)
output = LeakyReLU(output)
output = tf.reshape(output, [-1, 4*4*4*DIM])
output = lib.ops.linear.Linear('Discriminator.Output', 4*4*4*DIM, 1, output)
return tf.reshape(output, [-1])
Where Discriminator corresponds to a neural network. The first call to tf.gradients is working correctly and I get back non-zero slope values in the gradients variable. However, whenever I try to find the second derivative by applying tf.gradients to the gradients variable, my result is always a vector of zeroes. I don't expect this to be the case because it shouldn't be perfectly linear, am I using tf.gradients incorrectly?

Categories