I am working on a model the uses several different collections of features. There is one NN for each set of features but they all have the same structure. The NNs are built like the following,
results = []
sources = [input1, input2,...]
for src in sources :
result = Dense(25)(src)
results.append(result)
Model = model(input=sources, output=results)
I do have the model working such that it will compile and train.
My question is, since each component is separate, will the individual dense layers train using the loss from their corresponding y array? Or are all of the NNs trained using the combined loss?
I am hoping to keep all the NNs together like this if possible as they will always be run together.
All of the variables in each different NN will be independent of one another and therefore the combined loss can be seen as independent. What you will have to be careful of is that you will have N output tensors, where N is the number of different neural networks you have (i.e. the length of the results array).
You will have to make sure that your labels are in the same structure as the output of the model manually or create a custom loss function to handle this. One way to do this is to merge the outputs using tf.stack or tf.concat and then duplicate the labels to match these.
Related
I have individually trained the same neural network architecture on a large number of different datasets (order of 100s) to learn a unique non-linear function for each i.e have basically learned a set of weights that describes the function for each dataset.
Now, I want to use these sets of weights as a pre-trained layer in another optimization problem. I know how to load in a single saved model and employ that as a layer. However, what I will be doing is a group-wise optimization across the 100s of different datasets, where I have a pre-trained weights for each (from above).
So the setup is a batch of x datasets, each with n data points in d dimensions i.e. input data is of the shape [X, N, D]. There are a series of layers which act on all this data, then when it gets to the "pre-trained" layer, I wish to use different pre-trained weights i.e. For [0,:,:] uses the weights learned from dataset 0 from above, [1,:,:] with weights learned from dataset 1 etc etc etc.
I then need to combine the output of all this together, as the loss function for this groupwise optimization is based on the variance across all datasets. So I don't believe I can trivially evaulate one set, calculate loss, change weights, rinse and repeat and sum up at the end.
I doubt it is feasible to have some massive duplicate branches going, where I have x copies of the pre-trained NN layers as the pre-trained NN architecture is already quite complex.
Is it is possible to use a split layer, then a for loop type approach, in which I change the weights, then pass the correct portion of data through? Then merge all the outputs? Or is there a better way of tackling this?
Any help much appreciated.
I have trained a CNN on Fashion MNIST data with the following configuration:
Conv-Pool-Dropout-Conv-Pool-Dropout-Flat-Dense-Dropout-Output
I would like to change the configuration to:
Conv-Clustering-Pool-Dropout-Conv-Clustering-Pool-Dropout-Flat-Dense-Clustering-Dropout-Output
However, I want this new configuration only for testing and not training the model (I can use the weights from the trained model and set them for the model with Clustering configuration). Is there a way to add the Clustering layer using tensorflow?
I would like to represent the output of the Conv and Dense layers using the cluster centroids to examine the effects on the accuracy of the model.
What you are asking is in fact two questions:
How can I apply an operation only in some config and not others?
How can I apply k-means?
And here are the answers:
To apply an operation based on the value of some tensorflow variable do_clustering you can use tf.cond with:
maybe_clustered_ouput = tf.cond(do_clustering,
lambda: my_clustering_operation(input),
lambda: input)
To apply k-means clustering, you can use tf.compat.v1.estimator.experimental.KMeans (and replace that instead ofmy_clustering_operation` in the above snippet)
Good luck :)
I'm trying to do text classification with scikit-learn.
I have text that does not classify well. I think I can improve the predictions by adding data I can deduce in the form of an array of integers.
For example, sample 1 would come with [3, 1, 5, 2] and sample 2 would come with [2, 1, 4, 2]. This would also be true of the test data.
The idea is that the classifier could use both the text and the numbers to classify the data.
I've read the documentation for scikit learn and I can't find how to do it. It must be possible because all that is classified, internally, is vectors of numbers. So adding another vector of numbers should not be that much of a problem, but I can't figure out how. partial_fit adds more samples, it does not add more information about the existing samples. Is there a way to do what I'm trying to do. I tried to combine GaussianNB with SGDClassifier, but it turns out I don't know how to do that. (Was it a bad idea?)
What should I do?
I think you could add this new feature as another dimension to your training data. You need to modify the training data by adding your new features before calling SGD.
A simple/naive way would be:
For example, if my training data with two samples were
X = [ [1,2,3], [8,9,0] ]
And my new features for each sample was
new_feature_X = [ [11,22,33] , [77,88,00] ]
My new training data would be:
X_new = [[1,2,3,11,22,33] , [8,9,0,77,88,00]]
Then you call SGD.fit(X_new, labels)
As far as my SGD knowledge goes, I don't think there is any other way to combine two features.
The idea is that the classifier could use both the text and the
numbers to classify the data.
I find a neural network to be much more suitable for this. You could use two input layers, one for text vectors, and one for the numbers and feed them together into a network to get the output.
I tried to combine GaussianNB with SGDClassifier, but it turns out I
don't know how to do that. (Was it a bad idea?)
SGD means stochastic gradient descent. Is it possible to find the gradient of NaiveBayes? Whats the corresponding cost function ?
What should I do?
Ensemble. Train two separate classifiers. One using your text data, and another one for your new handcrafted feature. And then take the average of their prediction probabilities. You could train multiple classifier and take their votes. This tutorial is great for that.
Try out MLP Classifier. I used it a while ago, and found it works pretty great with text.
Neural networks. It's pretty easy with Keras.
Read research literature. There is pretty good chance academia might have done some work on your dataset. Try to read some of them. Google scholar, semantic scholar are great places to find published reseaerch.
from keras.layers import Input, Dense,Concatenate
from keras.models import Model
# This returns a tensor
text_input_vec = Input(shape=(784,))
new_numeric_feature = Input(shape=(4,))
# feed your text to a dense layer
dense1 = Dense(64, activation='relu')(text_input_vec)
# feed your numeric feature to another dense layer
dense2 = Dense(64, activation='relu')(new_numeric_feature)
# concatenate/combine the output of both
concat = Concatenate(axis=-1)([dense1,dense2])
# use the above to predict the label of your text. Layer below
# assumes you have 2 classes
predictions = Dense(2, activation='softmax')(concat)
model = Model(inputs=[text_input_vec,new_numeric_feature], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
Say we train a multilayer NN in tensorflow for a regression task (i.e. multi input and multi output case). Then we have new instances and we apply the trained model and of course we get the corresponding outputs. Is there a way to backpropagate the outputs and reconstruct the inputs in tensorflow in an easy/efficient manner? What I am thinking is to then use the difference of the original and the reconstructed inputs of the new instances as a QC measure i.e. if the reconstructed inputs are not close enough to the originals then we have a problem etc. I hope I am making myself clear.
No, unfortunately you cannot take a trained model and try to get the corresponding input. The reason for this is that you have infinite possible solutions for each output.
Furthermore, backpropagation is not passing an output backwards through the network. Its the idea of determining what parameters in the model are contributing to what extent to loss function. This will not give the inputs to these hidden layers, but the extent at which the weights affected your decision.
Currently I'm using VGG16 + Keras + Theano thought the Transfer Learning methodology to recognize plants classes. It works just fine and gives me a good accuracy. But the next problem I'm trying to solve - is to find a way of identifying if an input image contains plant at all. I don't want to have another one classifier that will do it, because it's not really efficiently.
So I did some search and have found that we can get activations from the latest model layer (before activation layer) and analyze it.
from keras import backend as K
model = util.load_model() # VGG16 model
model.load_weights(path_to_weights)
def get_activations(m, layer, X_batch):
x = [m.layers[0].input, K.learning_phase()]
y = [m.get_layer(layer).output]
get_activations = K.function(x, y)
activations = get_activations([X_batch, 0])
# trying to get some features from activations
# to understand how can we identify if an image is relevant
for l in activations[0]:
not_nulls = [x for x in l if x > 0]
# shows percentage of activated neurons
c1 = float(len(not_nulls)) / len(l)
n_activated = len(not_nulls)
print 'c1:{}, n_activated:{}'.format(c1, n_activated)
return activations
get_activations(model, 'the_latest_layer_name', inputs)
From the above code I've noticed that when we have very irrelevant image, the number of activated neurons is bigger than for images that contain plants:
For images that was using for model training, number of activated neurons 19%-23%
For images that contain unknown plants species 20%-26%
For irrelevant images 24%-28%
It's not really a good feature to understand if an image relevant as percentage values are intersect.
So, is there a good way to resolve this issue?
Thanks to Feras's idea in the comment above. After some trials, I've come up with the ultimate solution that allows solving this problem with accuracy up to 99.99%.
Steps are:
Train your model on a dataset;
Store activations (see method above how to get them) by predicting relevant and non-relevant images using trained model from the previous step. You should get activations from the penultimate layer. For VGG16 it's the last of two Dense(4096), for InceptionV3 - an extra penultimate Dense(1024) layer, for resnet50 - an extra penultimate Dense(2048) layer.
Solve a binary problem using stored activations data. I've tried a simple flat NN and Logistic Regression. Both were good in accuracy (flat NN was a bit more accurate), but I've chosen the Logistic Regression as it's simpler, faster and consumes less memory and CPU/GPU.
This process should be repeated each time after your model retrained as each time the final weights for CNN are different and what was working previously, will be different next time.
So as result we have another small model for solving the problem.