Feature weights - python

Is there a way to give weights to features before training a neural network model? The meaning of weight here would be similar to that in linear regression - a representation of how much to trust each feature. This is unrelated to node weights or feature importance estimation after training.
Something similar to this publication.

Related

Neural network Hyper-parameters Optimization and Sensitivity Analysis

I am working on very large dataset in Keras with a single-output neural network. Upon a change in depth of the network, I observed some improvements in the performance of the model. Therefore, I wanted to perform ""A systematic"" research-wise hyper-parameter optimization now (hidden layers, activation functions, # neurons, epochs, batch size, etc.). However, I was told that GridSearchCV and RandomSearchCV are not proper options since my dataset is large. I was wondering if any of you have experience in this regard or have feedback which may direct me to the right path.
use a confusion matrix and heat map to measure performance accuracy of your network
Y_pred=model.predict(X_test)
Y_pred2=np.argmax(Y_pred, axis=1)
Y_test2=np.argmax(Y_test, axis=1)
cm = confusion_matrix(Y_test2, Y_pred2)
sns.heatmap(cm)
plt.show()
print(classification_report(Y_test2, Y_pred2,target_names=label_names))

Autoencoder + K-means for clustering

I am implementing a feed-forward neural network for a specific clustering problem.
I'm not sure if it is possible or even makes sense, but the network consists of multiple layers followed by a clustering layer (say, k-means) used to calculate the clustering loss.
The NN layers act as a feature extractor, while the last layer is only used to calculate the loss (for example, by calculating the similarity score among different data points).
Actually, this network architecture is part of a bigger auto-encoder similar to what is discussed in this paper.
The question here is can I define a custom loss function in Tensorflow/Keras that receives the output of NN and compute the clustering loss? And how?

EnsembleVoteClassifier with neural network

I have a trained neural networks in which I am trying to average their prediction using EnsembleVoteClassifier from mlxtend.classifier. The problem is my neural network don't share the same input, (I performed feature reduction and feature select algorithms randomly and stored the results on new different variables, so I have something like X_test_algo1, X_test_algo2 and X_test_algo3 and Y_test).
I am trying to average the weights, but as I said, I don't have the same X, and I didn't any example on the documentation. How can I average the predictions for my three models model1, model2 and model3
eclf = EnsembleVoteClassifier(clfs=[model1, model2, model3], weights=[1,1,1], refit=False)
names = ['NN1', 'NN2', 'NN2', 'Ensemble']
eclf.fit(X_train_algo1, Ytrain) #????
If it's not possible, that is okay. I am only interested on how to calculate the formulas of Hard Voting, Hard Voting and Weighted Voting, or if there is anther library that is more flexible or the explicit expressions of the formulas could be helpful too.
Why would you need a library to do that?
Simply pass the same examples through all your neural networks and get the predictions (either logits or probabilities or labels).
Hard voting choose the label predicted most often by classifiers.
Soft voting, average probabilities predicted by classifiers and choose the label having the highest.
Weighted voting - either of the above can be weighted. Just assign weights to each classifier and multiply their predictions by them. Weights are usually normalized to (0, 1] range.
In principle you could also sum logits and choose the label with highest.
Oh, and weight averaging is different technique and requires you to have the same model and usually is done for the same initialization but at different training timesteps. You can read about it in this blog post.

Interpretability of MLPClassifier

I have a question regarding the interpretability of machine learning algorithms.
I have a dataset looking like this:
tabular data set
I have trained a classification model (MLPClassifier from Scikit-Learn) and want to know which features have the biggest impact (the highest weight) on the decision.
My final goal is to find different solutions (combination of features) which will have a high probability (>90%) to be classified as 1.
Does somebody know a way to get these solutions?
Thanks in advance!
To obtain the feature importance during a classification task the classification methodology has to be randomforest or decision tree, both implemented in sklearn,
clf = RandomForestClassifier(n_estimators=100, max_depth=2,random_state=0)
clf.fit(X, y)
#After the fit step
clf.feature_importances_
The feature importance will tell you how much weight each feature has, if your MLP classifier is trained properly, it will assign nearly similar importance to various features in your network,

Problem training an autoencoder for byte sequence classification

I am working on a classification task which uses byte sequences as samples. A byte sequence can be normalized as input to neural networks by applying x/255 to each byte x. In this way, I trained a simple MLP and the accuracy is about 80%. Then I trained an autoencoder using 'mse' loss on the whole data to see if it works well for the task. I freezed the weights of the encoder's layers and add a softmax dense layer to it for classification. I retrained the new model (only trained the last layer) and to my surprise, the result was much worse than the MLP, merely 60% accuracy.
Can't the autoencoder learn good features from all the data? Why the result is so bad?
Possible actions to take:
Check the error of autoencoder, could it really predict itself?
Visualize the autoencoder results (dimensionality reduction), is the variance explained with fewer dimensions?
Making model more complex does not necessarily outperform simpler ones, did you plot the validation mse versus epoch? Is there a global minimum after a number of steps?
Do you have enough epochs?
What is the number of units you have in your autoencoder? It may be too less (or too much, in case of underfitting) depending on the behavior of your data and its volume.
Did you make any comparison with other dimensionality reduction methods like PCA, NMF?
Last but not least, is it the best way to engineer your features with autoencoder for this task?
"Why the result is so bad?" This is not actually a surprise. You've trained one model to be good at compressing the information. The transformations it learns at each layer do not need to be good for any other type of task at all. In fact, it could be throwing away a lot of information that is perfectly helpful for whatever auxiliary classification task you have, but which is not needed for a task purely of compressing and reconstructing the sequence.
Instead of approaching it by training a separate autoencoder, you might have better luck with just adding sparsity penalty terms from the MLP layers into the loss function, or use some other types of regularization like dropout. Finally you could consider more advanced network architectures, like ResNet / ODE layers or Inception layers, modified for a 1D sequence.

Categories