This question already has answers here:
What should I do when my neural network doesn't learn?
(9 answers)
What should I do when my neural network doesn't generalize well?
(4 answers)
Closed 3 days ago.
This question was migrated from Stack Overflow because it can be answered on Cross Validated.
Migrated 3 days ago.
I am building AI model based intrusion detection system capable of identifying intrusions in real time and training my model on CICIDS2017 dataset.
I am following approach in which I combine the individual packets obtained from a stream to get a flow based on session keys (proto,IP's,ports) and then passing it as input for model prediction but model is not able to predict correctly even though it gets 99% accuracy on validation test dataset.
I have tried different architechtures based on LSTM, CNN etc.. but everything behaves same on real time streaming data -> all of them are randomly predicting meaninng as good as untrained model although they are getting 99% on validation datasets.
Any hints or suggestions in right direction would be helpful.
Related
Somewhat a beginner here on deep learning using Python & Stack Overflow.
I am currently working on something similar to a sentiment analysis of community posts using LSTM, and have been trying to add preprocessing steps to clean up the text data.
I have lots of ideas - say, 7 - for modifying/dropping certain data without sacrificing context that I think could improve my prediction accuracy, but I want to be able to see exactly how implementing one or some of these ideas can affect the prediction accuracy.
So is there a tool, statistical method or technique that I can use that will drastically cut down on the number of experiments (training the model + predicting on test set) that I need to do to see how "toggling on" one, two, or several of these preprocessing steps can affect my prediction accuracy, instead of having to do like 49 experiments and filling out the results on a 7x7 table? I have used the Taguchi method of design of experiments on a different kind of problem before, but not sure it can be applied properly here since the neural network will be trained in a completely different way based on the data it is fed.
Thank you for any input and advice!
This question already has answers here:
Is deep learning bad at fitting simple non linear functions outside training scope (extrapolating)?
(1 answer)
Predicting sine with ANN using Keras [duplicate]
(2 answers)
Unable to approximate the sine function using a neural network
(4 answers)
Approximating the sine function with a neural network
(3 answers)
Approximating sine function with Neural Network and ReLU
(2 answers)
Closed 1 year ago.
I have being experimenting with different kinds of ANNs to do regression on basic and increasingly more complex functions. It seems, to me though that I cannot get my network to learn cyclic functions like a sine wave. I read on the web and on this forums that generally ANN are not good at this job but I cant seem to fathom why. Isn't learning any function within its domain the same?
For clarification I am trying to fit a sin wave from x=0 to x=100 using the following setup
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(units=1, activation=None,input_dim=1,kernel_initializer='random_normal'),
keras.layers.Dense(units=64,activation='linear',use_bias=True),
keras.layers.Dense(units=32,activation="relu",use_bias=True),
keras.layers.Dense(units=64,activation="relu"),
keras.layers.Dense(units=64,activation='linear',use_bias=True),
keras.layers.Dense(units=32,activation='relu'),
keras.layers.Dense(units=1, activation='sigmoid'),
])
model.compile(optimizer='adam',
loss='mean_squared_logarithmic_error',
metrics=['mean_squared_error'])
return model
# Create a basic model instance
model = create_model()
# Display the model's architecture
model.summary()`
I have regularized my data to fit into the [x,y]=[0,1]^2 space and fed it into the network. I have given the network 1000 points and left it to train for many epochs (~100,000) and these are the results I got:
Overfitting
Predictions
I can understand that this is standard over-fitting behavior but I can't understand why it behaves as such. In Goodfellow's Deep Learing (which I am in the process of reading) he explains that optimal behavior of a machine Learning Algorithm is between the overfitting and underfitting region. It seems then that the model I have created is not converging to the solution in the future and is expected to perform worse!
Does this mean it can't interpolate to the sine function? Also why is this function so much more demanding computationally (most simple functions i tried converged in <1000 epochs) compared to other? Does it mean it requires more layer or maybe more units per layer? I understand the problem to be a classic regression problem for which I though sequential models where good.
Last but not least, I know that ANNs are not the way to go for periodic functions, but I am trying to understand why they struggle in this as a regression method.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to implement and reproduce the results of federated Bert pertaining in paper
Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.
I prefer to use TensorFlow
code of Bert pretraining.
For training in a federated way, initially, I had divided dataset into 3 different silos(each of that contains discharge summary of 50 patients, using mimic-3 data). and then pretrained the Bert model for each dataset using TensorFlow
implementation of Bert pretraining from the official release of Bert.
Now I have three different models that are pretrained from a different dataset. for model aggregation, I need to take an average of all three models. since the number of notes in each silo is equal, for averaging I need to do sum all models and divide by three.
How to take avg of models as did in the paper? somebody, please give me some insights to code this correctly. The idea of averaging the model weight is taken from the paper FEDERATED LEARNING: STRATEGIES FOR IMPROVING
COMMUNICATION EFFICIENCY
.
I am very new to deep learning
and TensorFlow
. so someone please help me to figure out the issue and suggest some reading material for TensorFlow
.
In the paper, it is mentioned that It is a good option to overcome privacy and regulatory issues while sharing of clinical data. My question is
is it possible to get sensitive data from this model.ckpt files? Then how?
Any help would be appreciated. Thanks...
Model averaging can be done in many ways. The simplest is to have a complete copy of each architecture in each silo, and take a (weighted) average of their parameter scores, and use this as the parameters for the full model. However there are a number of practical issues (latency, network speed, computational power of device) which may prohibit this, and so more complex solutions where silos are only trained on subsets of variables etc are used (as in the paper you cite).
It is not generally possible to retrieve information (sensitive of otherwise) from a dataset purely from the parameter updates to a model fine-tuned on it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
If you check my github, I have successfully implemented CNN, KNN for classifying signal faults. For that, I have taken the signal with little preprocessing for dimensionality reduction and provided it to the network, using its class information I trained the network, later the trained network is tested with testing samples to determine the class and computed the accuracy.
My question here how do I input the text information to CNN or any other network. For inputs, I took the Twitter database from kaggle, I have selected 2 columns which have names and gender information. I have gone through some algorithms which classify gender based on their blog data. I wasn't clear how I implement to my data (In my case, if I just want to classify using only names alone).
In some examples, which I understood I saw computing sparse matrix for the text, but for 20,000 samples the sparse matrix is huge to give as input. I have no problem in implementing the CNN architectures(I want to implement because no features are required) or any other network. I am stuck here, how to input data to the network. What kind of conversations can I make so that I take the names and gender information can be considered to train the network?
If my method of thinking is wrong please provide me suggestion which algorithm is the best way. Deep learning or any other methods are ok!
You could use character-level embeddings (i.e. your input classes are the different characters, so 'a' is class 1, 'b' is class 2 etc..). One-hot encoding the classes and then passing them through an embedding layer will yield unique representations for each character. A string can then be treated as a character-sequence (or equally a vector-sequence), which can be used as an input for either a recurrent or convolutional network. If you feel like reading, this paper by Kim et al. will provide you all the necessary theoretical backbone.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am just getting started with machine learning and is exploring different algorithms. I took a binary classification problem from the internet and tried applying various machine learning techniques.
First I tried running a naive baysien classifier on it and I found a success rate of about 75%. I tried out logistic regression and found a staggering success rate of 90%. I tried applying regularisation to my classifier and here is the curve that I found when I varied Lambda(the regularisation parameter) over 30 values. . The red plot is the training set and the blue one is the validation set. As you can see, the error margin in both the curves are increasing over Lambda. I think this would suggest that my hypothesis is underfit to begin with and the underfitting is getting worse with increase in lambda. Is this the correct way to interpret this?
Either way, in order to tackle the problem of underfitting, it would make sense to try a more complicated model so I turned to a neural network. My initial problem has 31 features characterising it and I choose a network with two hidden layers having 10 nodes each.
After training I found that it is classifying only 65% of the training data correctly. That is worse than the Naive-Baysien and the logistic regression. How often does this happen? Is it more likely that there is something wrong with my implementation of the neural network?
It is also interesting to note that the neural network seems to be converging after just 25-30 iterations. My logistic regression took 300 iterations to converge. I did considered the possibility that the neural network might be getting stuck in a local minima but according to Andrew NG's excellent course on machine learning which I am following, that is rather unlikely.
From what the course explained, the neural network in general, gives out better predictions than a logistic regression but you may run into problems with overfitting. However, I don't think that is the problem here since the 65% success rate is on the training set.
Do I need to go over my neural network implementation or is this a possible thing that can happen?
First, please try larger hidden layers such as 200 nodes each. Then update your result so we can see what is the critical problem.
When you use a neural network to classify your data, it actually fit a vector space which is suitable to do this task. In this case, suppose your data has 31 dimensions, at least a 32-dimensional space can perfectly classify your data if there is no sample both in positive class and negative class. So if you get a bad performance on training set, just enlarge your neural network until you get 100% result on training set, then you can start to think about generalization problem.