Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to implement and reproduce the results of federated Bert pertaining in paper
Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.
I prefer to use TensorFlow
code of Bert pretraining.
For training in a federated way, initially, I had divided dataset into 3 different silos(each of that contains discharge summary of 50 patients, using mimic-3 data). and then pretrained the Bert model for each dataset using TensorFlow
implementation of Bert pretraining from the official release of Bert.
Now I have three different models that are pretrained from a different dataset. for model aggregation, I need to take an average of all three models. since the number of notes in each silo is equal, for averaging I need to do sum all models and divide by three.
How to take avg of models as did in the paper? somebody, please give me some insights to code this correctly. The idea of averaging the model weight is taken from the paper FEDERATED LEARNING: STRATEGIES FOR IMPROVING
COMMUNICATION EFFICIENCY
.
I am very new to deep learning
and TensorFlow
. so someone please help me to figure out the issue and suggest some reading material for TensorFlow
.
In the paper, it is mentioned that It is a good option to overcome privacy and regulatory issues while sharing of clinical data. My question is
is it possible to get sensitive data from this model.ckpt files? Then how?
Any help would be appreciated. Thanks...
Model averaging can be done in many ways. The simplest is to have a complete copy of each architecture in each silo, and take a (weighted) average of their parameter scores, and use this as the parameters for the full model. However there are a number of practical issues (latency, network speed, computational power of device) which may prohibit this, and so more complex solutions where silos are only trained on subsets of variables etc are used (as in the paper you cite).
It is not generally possible to retrieve information (sensitive of otherwise) from a dataset purely from the parameter updates to a model fine-tuned on it.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I want to create two separate CNN models from a single CNN model. Let me name it as CNN-A and CNN-B.
i.e., Original CNN model = CNN-A model + CNN-B model
During the prediction, the raw input dataset are feed to the CNN-A. The output of the CNN-A are feed as input to the CNN-B. The original output of the original model is the output of CNN-B model.
To implement the above architecture, I would like get your suggestions and idea if any, please?
The implementation seems redundant. The reason is the input to each CNN should be an image. And let's say the output of the first CNN network is an image and you feed that to the second CNN, this is the same as stacking multiple convolution layers with additional dropouts and whatnot for the One CNN model.
So after all implementing a deep CNN will mimic the architecture you want.
You can also take a look at transfer learning, which is able to utilize a pre-trained model's layers and you are capable to add your own final layers and make adjustments. This is also similar to what you are talking about.
================ After Comment =====================
You could use a model architecture like mobilenet for a model to be deployed on your mobile.
You could also apply transfer learning to existing pre-trained mobilenet models which will save a lot of time and resources.
Lastly, you could deploy the model(used for computers) on a server using Flask. Then create an API that will provide predictions when you send the relevant data to the server via a POST request. This is commonly used to reduce the load on mobiles and this the approach that I would prefer. This method is relatively efficient and is easily scalable.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am studying deep learning recently, mainly rely on Andrew Ng's Deep Learning Specialization on Coursera.
And I want to build my own model to classify MNIST with 99% accuracy (simple MLP model, not CNN). So I use KerasClassifier to wrap my model and use GridsearchCV to fine tune the hyperparameters (including hidden layer number, units number, dropout rate, etc.)
However, when I google "fine tuning", the majority of the results are mainly on "transfer learning", which are just tuning the learning rate, output layer number or freeze layer number.
I know these famous models are capable to deal with many problems with just a little changes. But what if I want to build a tiny model from scratch to handle a special question, what are the common/best practice?
So my questions are mainly about the common/best practice of fine tuning model:
What is the common/best way to fine tuning? (I have seen people tune hyperparameters manually, or using scikit-learn's RandomizedSearchCV/GridSearchCV, or hyperas)
Should I use k-fold cross validation? (Because it's the default set of GridSearchCV, and it extremely increases the training time but helps little)
Is it enough to solve most problems by slightly modifying the off-the-shelf models? If not, to what direction should I move on?
Thanks!
In deep learning, fine tuning usually refers to retraining certain layers of a pretrained model on data from a new domain. I believe you are looking for something like "hyperparameter optimisation". There are various ways to approach this. People have their own preferences regarding the best method. Using RandomizedSearchCV/GridSearch CV or a combination of the two is fine. Trial and error using a few configurations is fine. People sometimes retrain well known architectures completely. Others resort to more complicated approaches, such as opmtimising hyperparameters using evolutionary programming (genetic algorithms). So you're kinda spoiled for choice.
Cross validation is usually only required if your data set is small, and further splitting into a train and test set reduces the size of the test set, decreasing statistical significance (reliability) of your results. If you have enough data, you can go ahead and just split into train/val/test sets. But its usually necessary since since large amounts of labelled data tend to be rare.
If off-the-shelf models are available for the problem you are trying to solve, use them! While familiarising yourself with the implementation and source code is insightful, there's no need to reinvent the wheel and rewrite these solutions!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to machine learning and have spent some time learning python. I have started to learn TensorFlow and Keras for machine learning and I literally have no clue nor any understanding of the process to make the model. How do you know which models to use? which activation functions to use? The amount of layers and dimensions of the output space?
I've noticed most models were the Sequential type, and tend to have 3 layers, why is that? I couldn't find any resources that explain which to use, why we use them, and when. The best I could find was tensorflow's function details. Any elaboration or any resources to clarify would be greatly appreciated.
Thanks.
I'd suggest you continue to read more about machine learning. The one here is a multi-part explanation. Disclaimer: I don't know the author and this is not my own work.
Also, I suggest a simple thought experiment where you have binary classification and have to consider how the different shapes of the activation functions may affect your results.
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
Regarding your model choice: This is highly dependent on your data and what you wish to explore. If I were you, I'd try to visualize your data first to see if there are any interesting relationships.
For example, a seaborn pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html) is one way to visualize relationships between variables. If you have a lot of data points, I'd suggest only using a sample of at most a few hundred data points as this plot can take a long time to make otherwise. You can also try DataShader, but I haven't used it personally.
Once you visualize your data, then try to actually think about what these relationships might mean between the variables. Doing all of this before using these machine learning models will guide you later as you try to implement some of the models in the above post.
Also, sometimes some deep learning algorithm is not the best approach. Often times (depending on whether you are doing a classification or regression problem) linear (or multiple linear) regression will suffice. For a regression problem, I often start with (multiple) linear regression as my baseline model and then improve upon it with regularization before I try fancy deep neural networks.
Deep neural networks are slower to train then linear models, can easily overfit your data, and can even give the same (and even worse!) results as the simpler linear regression. Consider whether you are trying to be a hammer in search of a nail when outright applying deep learning to a multitude of problems from the get-go.
Hope this helps.
TLDR:
Visualize your data and figure out if you want to do regression or classification
Start with simple linear models as a baseline and compute performance metric (ex. MSE)
Improve (hopefully) with neural networks and see if the additional gain is worth it in your case. At some point, you may have to experiment with different activation functions to see which suits your case more.
Nobody really knows why certain architectures work, that is still a topic of ongoing discussion (see, e.g., this paper).
Finding architectures that work is mostly trial and error, and adopting or modifying existing architectures that seem to work well for related tasks and dataset sizes.
I would refer you to Goodfellow, Bengio, and Courville's book, it is a great resource to get started with machine learning and deep learning in particular.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
If you check my github, I have successfully implemented CNN, KNN for classifying signal faults. For that, I have taken the signal with little preprocessing for dimensionality reduction and provided it to the network, using its class information I trained the network, later the trained network is tested with testing samples to determine the class and computed the accuracy.
My question here how do I input the text information to CNN or any other network. For inputs, I took the Twitter database from kaggle, I have selected 2 columns which have names and gender information. I have gone through some algorithms which classify gender based on their blog data. I wasn't clear how I implement to my data (In my case, if I just want to classify using only names alone).
In some examples, which I understood I saw computing sparse matrix for the text, but for 20,000 samples the sparse matrix is huge to give as input. I have no problem in implementing the CNN architectures(I want to implement because no features are required) or any other network. I am stuck here, how to input data to the network. What kind of conversations can I make so that I take the names and gender information can be considered to train the network?
If my method of thinking is wrong please provide me suggestion which algorithm is the best way. Deep learning or any other methods are ok!
You could use character-level embeddings (i.e. your input classes are the different characters, so 'a' is class 1, 'b' is class 2 etc..). One-hot encoding the classes and then passing them through an embedding layer will yield unique representations for each character. A string can then be treated as a character-sequence (or equally a vector-sequence), which can be used as an input for either a recurrent or convolutional network. If you feel like reading, this paper by Kim et al. will provide you all the necessary theoretical backbone.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
First of all, thank you for reading this post.
I am a noob when it comes to machine learning and I am trying to use ML to classify some data. Now I have done some basic reading on supervised and unsupervised learning algorithms such as decision trees, clustering, neural networks ..etc.
What I'm struggling to understand is the correct overall procedure for preparing datasets for a ML problem.
How do I prepare the dataset for ML so that I can measure the accuracy of the algorithms?
My current understanding is that to assess accuracy, the algorithm should be fed with pre-labelled results (from a significant subset of the dataset?) so as to assess the difference between the expected outcome and the algorithm's decision?
If this is correct then how does one go about pre-labelling large datasets? My dataset is quite big and manual labelling is not feasible.
Also, any tips on doing machine learning in Python would be much appreciated!
Thank you very much for your help in advance!
Best regards,
Mike
This is the most important part of any machine learning algorithm. You need to build your dataset, extract, make, scale, normalize features.
If you want to use some supervised learning algorithm, you need labeled data. There is several ways to achieve this:
Lebel it by hand.
Use some unsupervised learning algorithm to label data.
You need to use some python machine-learning toolkit, for example - scikit-learn. scikit-learn contains many useful tools for data mangling, feature extraction and preprocessing. For example, it can vectorize your data with DictVictorizer. You can add missing values, scale and normalize features using only scikit-learn.
I recommend to start with examples here - http://scikit-learn.org/stable/