Questions about hyperparameter tuning in Keras/Tensorflow [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am studying deep learning recently, mainly rely on Andrew Ng's Deep Learning Specialization on Coursera.
And I want to build my own model to classify MNIST with 99% accuracy (simple MLP model, not CNN). So I use KerasClassifier to wrap my model and use GridsearchCV to fine tune the hyperparameters (including hidden layer number, units number, dropout rate, etc.)
However, when I google "fine tuning", the majority of the results are mainly on "transfer learning", which are just tuning the learning rate, output layer number or freeze layer number.
I know these famous models are capable to deal with many problems with just a little changes. But what if I want to build a tiny model from scratch to handle a special question, what are the common/best practice?
So my questions are mainly about the common/best practice of fine tuning model:
What is the common/best way to fine tuning? (I have seen people tune hyperparameters manually, or using scikit-learn's RandomizedSearchCV/GridSearchCV, or hyperas)
Should I use k-fold cross validation? (Because it's the default set of GridSearchCV, and it extremely increases the training time but helps little)
Is it enough to solve most problems by slightly modifying the off-the-shelf models? If not, to what direction should I move on?
Thanks!

In deep learning, fine tuning usually refers to retraining certain layers of a pretrained model on data from a new domain. I believe you are looking for something like "hyperparameter optimisation". There are various ways to approach this. People have their own preferences regarding the best method. Using RandomizedSearchCV/GridSearch CV or a combination of the two is fine. Trial and error using a few configurations is fine. People sometimes retrain well known architectures completely. Others resort to more complicated approaches, such as opmtimising hyperparameters using evolutionary programming (genetic algorithms). So you're kinda spoiled for choice.
Cross validation is usually only required if your data set is small, and further splitting into a train and test set reduces the size of the test set, decreasing statistical significance (reliability) of your results. If you have enough data, you can go ahead and just split into train/val/test sets. But its usually necessary since since large amounts of labelled data tend to be rare.
If off-the-shelf models are available for the problem you are trying to solve, use them! While familiarising yourself with the implementation and source code is insightful, there's no need to reinvent the wheel and rewrite these solutions!

Related

Efficient way to compare effects of adding/removing multiple data-cleaning steps on the performance of deep learning model?

Somewhat a beginner here on deep learning using Python & Stack Overflow.
I am currently working on something similar to a sentiment analysis of community posts using LSTM, and have been trying to add preprocessing steps to clean up the text data.
I have lots of ideas - say, 7 - for modifying/dropping certain data without sacrificing context that I think could improve my prediction accuracy, but I want to be able to see exactly how implementing one or some of these ideas can affect the prediction accuracy.
So is there a tool, statistical method or technique that I can use that will drastically cut down on the number of experiments (training the model + predicting on test set) that I need to do to see how "toggling on" one, two, or several of these preprocessing steps can affect my prediction accuracy, instead of having to do like 49 experiments and filling out the results on a 7x7 table? I have used the Taguchi method of design of experiments on a different kind of problem before, but not sure it can be applied properly here since the neural network will be trained in a completely different way based on the data it is fed.
Thank you for any input and advice!

Tensor flow Model saving and Calculating Average of Models [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to implement and reproduce the results of federated Bert pertaining in paper
Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.
I prefer to use TensorFlow
code of Bert pretraining.
For training in a federated way, initially, I had divided dataset into 3 different silos(each of that contains discharge summary of 50 patients, using mimic-3 data). and then pretrained the Bert model for each dataset using TensorFlow
implementation of Bert pretraining from the official release of Bert.
Now I have three different models that are pretrained from a different dataset. for model aggregation, I need to take an average of all three models. since the number of notes in each silo is equal, for averaging I need to do sum all models and divide by three.
How to take avg of models as did in the paper? somebody, please give me some insights to code this correctly. The idea of averaging the model weight is taken from the paper FEDERATED LEARNING: STRATEGIES FOR IMPROVING
COMMUNICATION EFFICIENCY
.
I am very new to deep learning
and TensorFlow
. so someone please help me to figure out the issue and suggest some reading material for TensorFlow
.
In the paper, it is mentioned that It is a good option to overcome privacy and regulatory issues while sharing of clinical data. My question is
is it possible to get sensitive data from this model.ckpt files? Then how?
Any help would be appreciated. Thanks...
Model averaging can be done in many ways. The simplest is to have a complete copy of each architecture in each silo, and take a (weighted) average of their parameter scores, and use this as the parameters for the full model. However there are a number of practical issues (latency, network speed, computational power of device) which may prohibit this, and so more complex solutions where silos are only trained on subsets of variables etc are used (as in the paper you cite).
It is not generally possible to retrieve information (sensitive of otherwise) from a dataset purely from the parameter updates to a model fine-tuned on it.

How do you know which layers and how many layers to use? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to machine learning and have spent some time learning python. I have started to learn TensorFlow and Keras for machine learning and I literally have no clue nor any understanding of the process to make the model. How do you know which models to use? which activation functions to use? The amount of layers and dimensions of the output space?
I've noticed most models were the Sequential type, and tend to have 3 layers, why is that? I couldn't find any resources that explain which to use, why we use them, and when. The best I could find was tensorflow's function details. Any elaboration or any resources to clarify would be greatly appreciated.
Thanks.
I'd suggest you continue to read more about machine learning. The one here is a multi-part explanation. Disclaimer: I don't know the author and this is not my own work.
Also, I suggest a simple thought experiment where you have binary classification and have to consider how the different shapes of the activation functions may affect your results.
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
Regarding your model choice: This is highly dependent on your data and what you wish to explore. If I were you, I'd try to visualize your data first to see if there are any interesting relationships.
For example, a seaborn pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html) is one way to visualize relationships between variables. If you have a lot of data points, I'd suggest only using a sample of at most a few hundred data points as this plot can take a long time to make otherwise. You can also try DataShader, but I haven't used it personally.
Once you visualize your data, then try to actually think about what these relationships might mean between the variables. Doing all of this before using these machine learning models will guide you later as you try to implement some of the models in the above post.
Also, sometimes some deep learning algorithm is not the best approach. Often times (depending on whether you are doing a classification or regression problem) linear (or multiple linear) regression will suffice. For a regression problem, I often start with (multiple) linear regression as my baseline model and then improve upon it with regularization before I try fancy deep neural networks.
Deep neural networks are slower to train then linear models, can easily overfit your data, and can even give the same (and even worse!) results as the simpler linear regression. Consider whether you are trying to be a hammer in search of a nail when outright applying deep learning to a multitude of problems from the get-go.
Hope this helps.
TLDR:
Visualize your data and figure out if you want to do regression or classification
Start with simple linear models as a baseline and compute performance metric (ex. MSE)
Improve (hopefully) with neural networks and see if the additional gain is worth it in your case. At some point, you may have to experiment with different activation functions to see which suits your case more.
Nobody really knows why certain architectures work, that is still a topic of ongoing discussion (see, e.g., this paper).
Finding architectures that work is mostly trial and error, and adopting or modifying existing architectures that seem to work well for related tasks and dataset sizes.
I would refer you to Goodfellow, Bengio, and Courville's book, it is a great resource to get started with machine learning and deep learning in particular.

Will a neural network always outperform a logistic regression classifier? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am just getting started with machine learning and is exploring different algorithms. I took a binary classification problem from the internet and tried applying various machine learning techniques.
First I tried running a naive baysien classifier on it and I found a success rate of about 75%. I tried out logistic regression and found a staggering success rate of 90%. I tried applying regularisation to my classifier and here is the curve that I found when I varied Lambda(the regularisation parameter) over 30 values. . The red plot is the training set and the blue one is the validation set. As you can see, the error margin in both the curves are increasing over Lambda. I think this would suggest that my hypothesis is underfit to begin with and the underfitting is getting worse with increase in lambda. Is this the correct way to interpret this?
Either way, in order to tackle the problem of underfitting, it would make sense to try a more complicated model so I turned to a neural network. My initial problem has 31 features characterising it and I choose a network with two hidden layers having 10 nodes each.
After training I found that it is classifying only 65% of the training data correctly. That is worse than the Naive-Baysien and the logistic regression. How often does this happen? Is it more likely that there is something wrong with my implementation of the neural network?
It is also interesting to note that the neural network seems to be converging after just 25-30 iterations. My logistic regression took 300 iterations to converge. I did considered the possibility that the neural network might be getting stuck in a local minima but according to Andrew NG's excellent course on machine learning which I am following, that is rather unlikely.
From what the course explained, the neural network in general, gives out better predictions than a logistic regression but you may run into problems with overfitting. However, I don't think that is the problem here since the 65% success rate is on the training set.
Do I need to go over my neural network implementation or is this a possible thing that can happen?
First, please try larger hidden layers such as 200 nodes each. Then update your result so we can see what is the critical problem.
When you use a neural network to classify your data, it actually fit a vector space which is suitable to do this task. In this case, suppose your data has 31 dimensions, at least a 32-dimensional space can perfectly classify your data if there is no sample both in positive class and negative class. So if you get a bad performance on training set, just enlarge your neural network until you get 100% result on training set, then you can start to think about generalization problem.

What is the correct way to prepare dataset for machine learning? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
First of all, thank you for reading this post.
I am a noob when it comes to machine learning and I am trying to use ML to classify some data. Now I have done some basic reading on supervised and unsupervised learning algorithms such as decision trees, clustering, neural networks ..etc.
What I'm struggling to understand is the correct overall procedure for preparing datasets for a ML problem.
How do I prepare the dataset for ML so that I can measure the accuracy of the algorithms?
My current understanding is that to assess accuracy, the algorithm should be fed with pre-labelled results (from a significant subset of the dataset?) so as to assess the difference between the expected outcome and the algorithm's decision?
If this is correct then how does one go about pre-labelling large datasets? My dataset is quite big and manual labelling is not feasible.
Also, any tips on doing machine learning in Python would be much appreciated!
Thank you very much for your help in advance!
Best regards,
Mike
This is the most important part of any machine learning algorithm. You need to build your dataset, extract, make, scale, normalize features.
If you want to use some supervised learning algorithm, you need labeled data. There is several ways to achieve this:
Lebel it by hand.
Use some unsupervised learning algorithm to label data.
You need to use some python machine-learning toolkit, for example - scikit-learn. scikit-learn contains many useful tools for data mangling, feature extraction and preprocessing. For example, it can vectorize your data with DictVictorizer. You can add missing values, scale and normalize features using only scikit-learn.
I recommend to start with examples here - http://scikit-learn.org/stable/

Categories