3d points coordinate regression - python

I have 30 sets of 3D points which are the keypoints describing 30 objects, each set contains 10 points, these points denote as X with shape [30,10,3]. I also have the corrsponding 3D points of the 30 objects after certain transformation, denote as Y with shape [30,10,3].
Now I want to train a machine learning model from these 30 objects, using the X and Y as data and annotation, and predict the keypoints coordinates of a new object after transformation.
Anybody has ideas how to do it with python?

30 sets seem too low for the task, as the input and output have too many dimensions (30). If you need to map these high dimensional data, you need thousands of examples (more sets).
I would suggest to simulate the transformations to generate a few thousands samples. Then use a small neural network to predict the Y from X. As, the inputs don't have spatial or temporal dimensions instead represent discrete points I don't think convolutional or recurrent models will be useful.
So, start with a small MLP with mean squared error loss. However, if the output points are always integer then you can model it as a classification problem too considering the range is not big.
I have added a small neural network model in keras to predict the transformation.
import numpy as np
import keras
import tensorflow
from keras.layers import Input, Dense, Reshape
from keras.models import Model
X = np.random.randint(-100, 100, (3000, 10, 3)) # 10 3d points
Y = 2*(X + 5)/7 # this is our simple transformation operation
print(X.shape)
print(Y.shape)
in_m = Input(shape=(30,)) # input layer
f1_fc = Dense(100, activation = 'relu')(in_m) # first fc layer
f2_fc = Dense(30, activation = 'linear')(f1_fc) # second fc layer
simple_model = Model(in_m, f2_fc)
simple_model.summary()
simple_model.compile(loss='mse', metrics=['mae'], optimizer='adam')
X_flat = np.reshape(X, (3000, 30))
Y_flat = np.reshape(Y, (3000, 30))
hist = simple_model.fit(X_flat, Y_flat, epochs = 100, validation_split = 0.2, batch_size = 20)
Output:
(3000, 10, 3)
(3000, 10, 3)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_9 (InputLayer) (None, 30) 0
_________________________________________________________________
dense_15 (Dense) (None, 100) 3100
_________________________________________________________________
dense_16 (Dense) (None, 30) 3030
=================================================================
Total params: 6,130
Trainable params: 6,130
Non-trainable params: 0

Related

keras conv1d layer weight count is not produced as expected

I'm trying to convert a trained sign language classification solution in python to a C language headers so that I can deploy on a M4-cortex CPU board.
In Python, I'm able to build model and train it and I can see it predicting with 90% accuracy.
But I see an issue with number of weights used/generated in convolution layers
**Conv_1d configuration**
print(x_train.shape)
model = Sequential()
model.add(Conv1D(32,kernel_size=5, padding='same',
input_shape=x_train.shape[1:], name='conv1d_1'))
print(model.layers[0].kernel.numpy().shape)
**output:**
(1742, 45, 45)
**(5, 45, 32)**
According to above configuration
input dimension = 45x45x1 pixels of image(gray scale)
input channels = 1
output dimension = 45x45x32
output channesls = 32
kernel size = 5
As per the concept(w.r.t https://cs231n.github.io/convolutional-networks/)
number of weights = (input_channels) x (kernel_size) x (kernel_size) x (output_channels)=1x5x5x32=800
But keras model produces weights array of size = [5][45][32]=7200
I'm not sure if my interpretation of weight array in keras model is correct, I would be glad if someone can help me with this
Some bullets that should clarify your doubts.
You're formula for the number of weights can't be right because you're using a Conv1D, so the kernel size has only one dimension.
Defining the input shape x_train.shape[1:] = (45,45) corresponds to 45 filters applied on an array with 45 elements (again because it's a Conv1D).
Said so, the number of weights is:
# of weights = input_filters x kernel_size x output_filters = 45x5x32 = 7200 (without biases)
Considering that you have images, probably you're looking for Conv2D. In this case, the input shape should be (45,45,1), the kernel has two dimensions, and the number of parameters is exactly 800 (without biases)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32,kernel_size=5, padding='same',
input_shape=(45, 45, 1), use_bias=False))
model.summary()
# Layer (type) Output Shape Param #
# conv (Conv2D) (None, 45, 45, 32) 800

Keras CNN for non-image matrix

I have recently started learning about Deep Learning and Reinforcement Learning, and I am trying to figure out how to code a Convolutional Neural Network using Keras for a matrix of 0s and 1s with 10 rows and 3 columns.
The input matrix would look like this for example
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
...
]
The output should be another matrix of 0s and 1s, different from the aforementioned input matrix and with a different number of rows and columns.
The location of 0s and 1s in the output matrix is dependent on the location of the 0s and 1s in the input matrix.
There is also a second output, an array where the values are dependent on the location of the 1 in the input matrix.
I have searched the internet for code examples but couldn't find anything useful.
Edit:
The input to the neural network is a 2D array with 10 rows and each row has 3 columns.
The output (for now at least) is a 2D array with 12 rows and each row has 10 columns (the same as the number of rows in the input 2D array).
This is what I came up with so far and I have no idea if it's correct or not.
nbre_users = 10 # number of rows in the input 2D matrix
nbre_subchannels = 12 # number of rows in the output 2D matrix
model = Sequential()
model.add(Dense(50, input_shape=(nbre_users, 3), kernel_initializer="he_normal" ,activation="relu"))
model.add(Dense(20, kernel_initializer="he_normal", activation="relu"))
model.add(Dense(5, kernel_initializer="he_normal", activation="relu"))
model.add(Flatten())
model.add(Dense(nbre_subchannels))
model.add(Dense(nbre_users, activation = 'softmax'))
model.compile(optimizer=Adam(learning_rate=1e-4), loss='mean_squared_error')
Here is the model summary:
After clarifications, here is my answer.
The problem you are trying to solve seems to be a neural network that transforms a 2D grayscale image of size (10,3,1) to a 2D grayscale image of size (12,10,1).
A 2D grayscale image is nothing but a 2D matrix with an extra axis set to 1.
a = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a.shape
#OUTPUT = (3,3)
a.reshape((3,3,1)) #reshape to 3,3,1
#OUTPUT -
#array([[[0],
# [1],
# [0]],
#
# [[1],
# [0],
# [1]],
#
# [[0],
# [1],
# [0]]])
So a 2D matrix of (10,3) can be called a 3D image with a single channel (10,3,1). This will allow you to properly apply convolutions to your input.
If this part is clear, then in the forward computation of the network, since you want to ensure that spatial positions of the 1s and 0s are captured, you want to use convolution layers. Using Dense layers here is not the right step.
However, a series convolution operation help to Downsample and image. Since you need an output 2D matrix (gray scale image), you want to Upsample as well. Such a network is called a Deconv network.
The first series of layers convolve over the input, 'flattening' them into a vector of channels. The next set of layers use 2D Conv Transpose operations to change the channels back into a 2D matrix (Gray scale image)
Refer to this image for reference -
Here is a sample code that shows you how you can take a (10,3,1) image to a (12,10,1) image using a deconv net.
from tensorflow.keras import layers, Model
inp = layers.Input((10,3,1)) ##
x = layers.Conv2D(2, (2,2))(inp) ## Convolution part
x = layers.Conv2D(4, (2,2))(x) ##
x = layers.Conv2DTranspose(4, (3,4))(x) ##
x = layers.Conv2DTranspose(2, (2,4))(x) ## Deconvolution part
out = layers.Conv2DTranspose(1, (2,4))(x) ##
model = Model(inp, out)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_33 (InputLayer) [(None, 10, 3, 1)] 0
_________________________________________________________________
conv2d_49 (Conv2D) (None, 9, 2, 2) 10
_________________________________________________________________
conv2d_50 (Conv2D) (None, 8, 1, 4) 36
_________________________________________________________________
conv2d_transpose_46 (Conv2DT (None, 10, 4, 4) 196
_________________________________________________________________
conv2d_transpose_47 (Conv2DT (None, 11, 7, 2) 66
_________________________________________________________________
conv2d_transpose_48 (Conv2DT (None, 12, 10, 1) 17
=================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
_________________________________________________________________
Obviously, feel free to add activations, dropouts, pooling layers etc etc etc. The above code just shows how you can use downsampling and upsampling to get from a given single-channel image to another single-channel image.
On a side note - I would really advise that you spend some time understanding how CNNs work. Deconv nets are complex and if you are solving a problem that involves them, before properly understanding how 2D CNNs work, it may cause some foundational problems especially if you are starting to learn this domain.
You can use 1D convolutional layers if you want to convolve in a single spatial, which from what I understood what you want.
e.g.
# assuming 3x10 matrix with single batch
input_shape = (1, 3, 10)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)

LSTM predicting constant value throughout

I understand that it is a long post, but help in any of the sections is appreciated.
I have some queries about the prediction method of my LSTM model. Here is a general summary of my approach:
I used a dataset having 50 time series for training. They start with a value of 1.09 all the way up to 0.82, with each time series having between 570 to 2000 datapoints (i.e, each time series has a different length, but similar trend).
I converted them to the dataset accepted by keras' LSTM/Bi-LSTM layers in the format:
[1, 0.99, 0.98, 0.97] ==Output==> [0.96]
[0.99, 0.98, 0.97, 0.96] ==Output==> [0.95]
and so on..
Shapes of the input and output containers (arrays): input(39832, 5, 1) and output(39832, )
Error-free training
Prediction on an initial points of data (window) having shape (1, 5, 1). This has been taken from the actual data.
The predicted output is one value, which is appended to a separate list (for plotting), as well as appended to the window, and the first value of the window dropped out. This window is then fed as input to the model to generate the next prediction point.
Continue this until I get the whole curve for both models (LSTM and Bi-LSTM)
However, the prediction is not even close to the actual data. It flatlines to a fixed value, whereas it should be somewhat like the black curve (which is the actual data)
Figure:https://i.stack.imgur.com/Ofw7m.png
Model (similar code goes for Bi-LSTM model):
model_lstm = Sequential()
model_lstm.add(LSTM(128, input_shape=(timesteps, 1), return_sequences= True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(128, return_sequences= False))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(1))
model_lstm.compile(loss = 'mean_squared_error', optimizer = optimizers.Adam(0.001))
Curve prediction initialize:
start = cell_to_test[0:timesteps].reshape(1, timesteps, 1)
y_curve_lstm = list(start.flatten())
y_window = start
Curve prediction:
while len(y_curve_lstm) <= len(cell_to_test):
yhat = model_lstm.predict(y_window)
yhat = float(yhat)
y_curve_lstm.append(yhat)
y_window = list(y_window.flatten())
y_window.append(yhat)
y_window.remove(y_window[0])
y_window = np.array(y_window).reshape(1, timesteps, 1)
#print(yhat)
Model summary:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_5 (LSTM) (None, 5, 128) 66560
_________________________________________________________________
dropout_5 (Dropout) (None, 5, 128) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0
_________________________________________________________________
dense_5 (Dense) (None, 1) 129
=================================================================
Total params: 198,273
Trainable params: 198,273
Non-trainable params: 0
_________________________________________________________________
And in addition to diagnosing the problem, I am really trying to find the answers to the following questions (I looked up other sources, but in vain):
Is my data enough to train the LSTM model? I have been told that it requires thousands of data points, so I feel that my current dataset more than suffices the condition.
Is my model less/more complex than it needs to be?
Does increasing the number of epochs, layers, and the neurons per layer always lead to a 'better' model, or are there optimal values for the same? If the latter, then is there a method to find this optimal point, or is hit-and-trail the only way?
I trained with the number of epochs=25, which gave me a loss of 1.25 * 10e-4. Should the loss be lower for the model to predict the trend? (I am focused on getting the shape first, accuracy later, because the training takes too long with higher epochs)
In continuation to the previous question, does loss have the same unit as the data? The reason why I am asking this is because the data has a resolution of up to 10e-7.
Once again, I understand that it has been a long post, but help in any of the sections is appreciated.

Creating tensor of dynamic shape from python lists to feed tensorflow RNN

I'm creating an end-to-end speech recognition architecture, in which my data is a list of segmented spectrograms. My data has shape (batch_size, timesteps, 8, 65, 1) in which batch_size is fixed but timesteps is varying. I can't figure out, how to put this data into a tensor with the appropriate shape to feed my model. Here is a piece of code that shows my problem:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Flatten, TimeDistributed
from tensorflow.keras.layers import SimpleRNN, LSTM
from tensorflow.keras import Input, layers
from tensorflow.keras import backend as K
segment_width = 8
segment_height = 65
segment_channels = 1
batch_size = 4
segment_lengths = [28, 33, 67, 43]
label_lengths = [16, 18, 42, 32]
TARGET_LABELS = np.arange(35)
# Generating data
X = [np.random.uniform(0,1, size=(segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
y = [np.random.choice(TARGET_LABELS, size=label_lengths[k]) for k in range(batch_size)]
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(None, segment_width, segment_height, segment_channels),
dtype='float32')
input_segment_lengths = tf.keras.Input(name='input_segment_lengths', shape=[1], dtype='int64')
input_label_lengths = tf.keras.Input(name='input_label_lengths', shape=[1], dtype='int64')
# More complex architecture comes here
outputs = Flatten()(input_segments_data)
model = tf.keras.Model(inputs=[input_segments_data, input_segment_lengths, input_label_lengths], outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
output:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_segments_data (InputLayer [(None, None, 8, 65, 0
__________________________________________________________________________________________________
input_segment_lengths (InputLay [(None, 1)] 0
__________________________________________________________________________________________________
input_label_lengths (InputLayer [(None, 1)] 0
__________________________________________________________________________________________________
flatten (Flatten) (None, None) 0 input_segments_data[0][0]
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________
Now when I try to predict from my random data:
model.predict([X, segment_lengths, segment_lengths])
I get this error:
ValueError: Error when checking input: expected input_segments_data to have 5 dimensions, but got array with shape (4, 1)
How can I convert X (which is a list of arrays) to a tensor of shape (None, None, 8, 65, 1) and feed it to my model? I don't want to use zero padding!
Keras model takes numpy array (tensor) as input. You cannot have a tensor with variable timesteps. Instead, what you can do is to pad all the data into same shape, using e.g. pad_sequence And then, you can add a Masking layer to your model to ignore the padded values.
This is a common issue with Tensorflow and other deep learning frameworks that operate on tensors. Unfortunately, there is no current easy way to this exactly as you asked, besides padding your sequences and then masking.
To do this, you simply have to store your input data in a numpy array with fixed dimensions and feed that to the model. You have to add dummy values to represent the missing timesteps in your sequences (a common value is 0).
Then, you have to add a Masking layer to your model, that will tell Keras to ignore the timesteps that have the dummy features.
From the documentation:
keras.layers.Masking(mask_value=0.0)
If all features for a given sample timestep are equal to mask_value, then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking).
I've adapted and simplified part of your code to give you an idea of how this works. You can adapt this to your variable-sized labels, as well:
# Generating data (using a dummy zero-array to store padded sequences)
X = np.zeros((batch_size, max(segment_lengths), segment_width, segment_height, segment_channels))
X_true = [np.ones((segment_lengths[k], segment_width, segment_height, segment_channels))
for k in range(batch_size)]
# Populate dummy array
for i, x in enumerate(X_true):
X[i, -segment_lengths[i]:, ...] = x
# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(max(segment_lengths), segment_width, segment_height, segment_channels))
masked_segments_data = tf.keras.layers.Masking()(input_segments_data)
# More complex architecture comes here
outputs = tf.keras.layers.Flatten()(input_segments_data)
model = tf.keras.Model(inputs=input_segments_data, outputs = outputs)
def dummy_loss(y_true, y_pred):
return y_pred
model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()
A drawback of this approach is that if you actually have a "real" feature that is exactly like a dummy feature (e.g., all zeros), the model will mask it. Choose your masking value appropriately to avoid this.
An alternative approach would be to do something similar as what you did, but using batches of size 1. This, however, is likely to cause instability in your training procedure and I would avoid it if possible.
As a final note, Tensorflow 2 added support for RaggedTensors, which are tensors with one or more variable dimensions. Currently there is no support for RNNs, but it will probably be added eventually.
Hope this helps.

Keras SimpleRNN confusion

...coming from TensorFlow, where pretty much any shape and everything is defined explicitly, I am confused about Keras' API for recurrent models. Getting an Elman network to work in TF was pretty easy, but Keras resists to accept the correct shapes...
For example:
x = k.layers.Input(shape=(2,))
y = k.layers.Dense(10)(x)
m = k.models.Model(x, y)
...works perfectly and according to model.summary() I get an input layer with shape (None, 2), followed by a dense layer with output shape (None, 10). Makes sense since Keras automatically adds the first dimension for batch processing.
However, the following code:
x = k.layers.Input(shape=(2,))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
raises an exception ValueError: Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2.
It works only if I add another dimension:
x = k.layers.Input(shape=(2,1))
y = k.layers.SimpleRNN(10)(x)
m = k.models.Model(x, y)
...but now, of course, my input would not be (None, 2) anymore.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 2, 1) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, 10) 120
=================================================================
How can I have an input of type batch_size x 2 when I just want to feed vectors with 2 values to the network?
Furthermore, how would I chain RNN cells?
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...raises the same exception with incompatible dim sizes.
This sample here works:
x = k.layers.Input(shape=(2, 1))
h = k.layers.SimpleRNN(10, return_sequences=True)(x)
y = k.layers.SimpleRNN(10)(h)
m = k.models.Model(x, y)
...but then layer h does not output (None, 10) anymore, but (None, 2, 10) since it returns the whole sequence instead of just the "regular" RNN cell output.
Why is this needed at all?
Moreover: where are the states? Do they just default to 1 recurrent state?
The documentation touches on the expected shapes of recurrent components in Keras, let's look at your case:
Any RNN layer in Keras expects a 3D shape (batch_size, timesteps, features). This means you have timeseries data.
The RNN layer then iterates over the second, time dimension of the input using a recurrent cell, the actual recurrent computation.
If you specify return_sequences then you collect the output for every timestep getting another 3D tensor (batch_size, timesteps, units) otherwise you only get the last output which is (batch_size, units).
Now returning to your questions:
You mention vectors but shape=(2,) is a vector so this doesn't work. shape=(2,1) works because now you have 2 vectors of size 1, these shapes exclude batch_size. So to feed vectors of size to you need shape=(how_many_vectors, 2) where the first dimension is the number of vectors you want your RNN to process, the timesteps in this case.
To chain RNN layers you need to feed 3D data because that what RNNs expect. When you specify return_sequences the RNN layer returns output at every timestep so that can be chained to another RNN layer.
States are collection of vectors that a RNN cell uses, LSTM uses 2, GRU has 1 hidden state which is also the output. They default to 0s but can be specified when calling the layer using initial_states=[...] as a list of tensors.
There is already a post about the difference between RNN layers and RNN cells in Keras which might help clarify the situation further.

Categories