Training a new dataset on bert - python

I am new to BERT
I have a amazon review dataset, where I want to predict the star rating based on the review
I know I can use a pretrained bert model as shown here
But I want to train the bert model on my own dataset. Is that whats being done here? And can I apply this type of 'fine tuning' on a pretrained model with any dataset to get more accurate results or do I have to do something else to train the model from scratch
And if I do want to train a model from scratch, where would I start

First of all what is pretraining? The procedure helps the model to learn syntactic <==> semantic (this is a spectrum) features of the language using an enormous amount of raw text (40GB) and processing power. objective function: casual language model and mask language model
What about fine-tuning a pre-trained model? suppose there is a model which has knowledge about the general aspect of the English language (POS, dependency tree, subj ... a little of everything). fine-tuning help us to direct the focus of the model on the most important features in our dataset, let's say in your dataset some syntactic feature is the game-changer, and the model should be careful about it!
objective function: based on downstream task
Training from scratch isn't feasible for most of us, but there is an approach to continue the pre-training phase using your own corpus/corpora (task-specific) without damaging model pieces of knowledge (hopefully)!
objective function: casual language model and mask language model
Here is an article about this approach and its effectiveness and you can be inspired by Scibert and COVIDbert. As you expect the use pre-trained bert as a starting point and continue pre-training using domain-specified corpus!

Related

Why some TensorFlow-Hub models are not fine tunable?

I am just learning image classification with TensorFlow and found that there is a TensorFlow hub where we can use a lot of models for a lot of classification tasks. For example, I want to build food classification and develop the model so the model would cover foods in my country and have a higher accuracy on some specific foods. I try to use and tune this model: https://tfhub.dev/google/aiy/vision/classifier/food_V1/1, but why there is information that the model is not fine-tunable?
What makes a model can be fine-tuned and can't be fine-tuned?
Thank you.
The publisher/creator of the model makes he decision on whether the model is fine-tunable or not. Making a model fine-tunable requires model creator to make sure that the TF computation graph supports fine-tuning. For example, if the model contains dropout or batchnorm, the computation graph for fine-tuning and for inference-only will be different. The publisher/creator of the model has to make sure that model is exported correctly to support both these cases. Sometimes publishers do not to go through these steps and mark the model as non fine-tunable. 

Using tensorflow classification for feature extraction

I am currently working on a system that extracts certain features out of 3D-objects (Voxelgrids to be precise), and i would like to compare those features to automatically made features when it comes to performance (classification) in a tensorflow cNN with some other data, but that is not the point here, just for background.
My idea now was, to take a dataset (modelnet10), train a tensorflow cNN to classify them, and then use what it learned there on my dataset - not to classify, but to extract features.
So i want to throw away everything the cnn does,except for what it takes from the objects.
Is there anyway to get these features? and how do i do that? i certainly have no idea.
Yes, it is possible to train models exclusively for feature extraction. This is called transfer learning where you can either train your own model and then extract the features or you can extract features from pre-trained models and then use it in your task if your task is similar in nature to that of what the pre-trained model was trained for. You can of course find a lot of material online for these topics. However, I am providing some links below which give details on how you can go about it:
https://keras.io/api/applications/
https://keras.io/guides/transfer_learning/
https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
https://www.pyimagesearch.com/2019/05/27/keras-feature-extraction-on-large-datasets-with-deep-learning/
https://www.kaggle.com/angqx95/feature-extractor-fine-tuning-with-keras

Fine-tune a BERT model for context specific embeddigns

I'm trying to find information on how to train a BERT model, possibly from the Huggingface Transformers library, so that the embedding it outputs are more closely related to the context o the text I'm using.
However, all the examples that I'm able to find, are about fine-tuning the model for another task, such as classification.
Would anyone happen to have an example of a BERT fine-tuning model for masked tokens or next sentence prediction, that outputs another raw BERT model that is fine-tuned to the context?
Thanks!
Here is an example from the Transformers library on Fine tuning a language model for masked token prediction.
The model that is used is one of the BERTForLM familly. The idea is to create a dataset using the TextDataset that tokenizes and breaks the text into chunks. Then use a DataCollatorForLanguageModeling to randomly mask tokens in the chunks when traing, and pass the model, the data and the collator to the Trainer to train and evaluate the results.

Feature extraction using deep learning for more than one class

I have a time series dataset and I want to extract its features using BRNN or CNN -RNN ( python programming language)
First, I train the model with the classification layer and obtain best accuracy.
Then, I want to take the features from one of model's hidden layer.
But I have 41 class how can I extract the features of all these classes and how I know the extracted features belong to which class?
Because I want each class with its extracted features to make some calculation later.
A lot of it depends on what you're doing. Generally, you'll want to try an autoencoder or a transformer and do unsupervised/semisupervised learning.
Here's some material that might give some insight into some methods.
https://arxiv.org/abs/2006.07733 (Bootstrap your own latent -- Deepmind)
https://arxiv.org/abs/1610.02242 (Temporal ensembling for semisupervised learning -- Timo Aina)
https://www.coursera.org/lecture/intro-to-deep-learning/autoencoders-101-QqBOa (Autoencoder -- Andrew Ng)

Is it possible to validate a deep learning model by training small data subset?

I am looking to train a large model (resnet or vgg) for face identification.
Is it valid strategy to train on few faces (1..3) to validate a model?
In other words - if a model learns one face well - is it evidence that the model is good for the task?
point here is that I don't want to spend a week of GPU expensive time only to find out that my model is no good or data has errors or my TF coding has a bug
Short answer: No, because Deep Learning works well on huge amount of data.
Long answer: No. The problem is that learning only one face could overfit your model on that specific face, without learning features not present in your examples. Because for example, the model has learn to detect your face thanks to a specific, very simple, pattern in that face (that's called overfitting).
Making a stupid simple example, your model has learn to detect that face because there is a mole on your right cheek, and it has learn to identify it
To make your model perform well on the general case, you need an huge amount of data, making your model capable to learn different kind of patterns
Suggestion:
Because the training of a deep neural network is a time consuming task, usually one does not train one single neural network at time, but many neural network are trained in parallel, with different hyperparameters (layers, nodes, activation functions, learning rate, etc).
Edit because of the discussion below:
If your dataset is small is quite impossible to have a good performance on the general case, because the neural network will learn the easiest pattern, which is usually not the general/better one.
Adding data you force the neural network to extract good patterns, that work on the general case.
It's a tradeoff, but usually a training on a small dataset would not lead to a good classifier on the general case
edit2: refrasing everything to make it more clear. A good performance on a small dataset don't tell you if your model when trained on all the dataset is a good model. That's why you train to
the majority of your dataset and test/validate on a smaller dataset
For face recognition, usually a siamese net or triplet loss are used. This is an approach for one-shot learning. Which means it could perform really well given only few examples per class (person face here), but you still need to train it on many examples (different person faces). See for example:
https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d
You wouldn't train your model from scratch but use a pretrained model anyways and fine-tune it for your task
You could also have a look at pretrained face recognition models for better results like facenet
https://github.com/davidsandberg/facenet

Categories