So i have this code that is an ASR using transformers https://www.kaggle.com/code/bernardoolisan/speechrecognition-dot, the problem is that I was using a dataset of 25 hours of audio, the ASR works but when using new data it sucks, this is because of 25 hours, at least you should use 100 up hours for training.
So I decided to use LibriSpeech dataset that contains 1000, 360, 100 hours for training, the thing is that when I use libriSpeech when training I get
loss:nan
Picture of loss:nan when training
Why am I getting loss:nan? it was working with the old dataset but now is not, and I did not change any parameter...
Any idea?
Related
I developed the ML for my research, and it needs to be trained with a large amount of data, actually I have to train it for 100 epochs. But, my macbook (m2 13") can't hold it, and also I have to use my laptop for studying, so I can't leave it train all day. I WANNA KNOW. If I seperate the amount of epochs into 10 epochs and train the ML with the same dataset in 10 days, will it give the same result with training 100 epochs for one time?
I use "Yolov5".
It would be easier to answer if you gave us the libraries you're using for your project.
But you should search for training checkpoints these let you train for a certain number of epochs and then start from the last checkpoint and train another X epochs later.
I'm using XGBoost, the default, to forecast a hourly time series(720 hours is the size of my test), I always run the model 10 times, and in the end I analyse the the 10 runs. With XGBoost I'm getting the exactly same predictions, all 720 hours, in all the 10 times, without any change. I'm using the default version of the XGBoost, already tried to change seed, putting a different seed number in each run, changed the random_state too, all that with no sucesss, the only thing that changes it is when I put subsample = 0.99, but my professor told me that he wants to use all the training dataset in subsample, but using subsample = 1 present the same erro of all 10 exactly same predictions. Already checked my code and thats no error in the way I'm predicting the time series.
I'm using Xgboost that way.
import xgboost as xg
for run in range(10):
model = xg.XGBRegressor().fit(X_train,y_train)
Thanks for the help.
I have a dataset (around 120k entries, 8mb, 4 cols, one with text). I ran a MultinominalNB to classify the text column in order to predict its classe (another column).
I did that with a pipeline as follows (the text column goes though cleaning text process, including stopwords removal prior to pipeline).
text_clf_comp = Pipeline([('vect', CountVectorizer(ngram_range=(1,6))),
('tfidf', TfidfTransformer(use_idf=False)),
('clf', MultinomialNB(alpha=0.01)),])
text_clf_comp = text_clf_comp.fit(X_train_comp, y_train)
The parameter were optimized using GridSearch.
The pipeline and fit takes 17s and the model is very good at predicting.
The problem occurs when I try to save the model using joblib or pickle. It creates a 300mb file and takes 7 min to run. Doesn't make sense, considering the time to train and the size of the data.
saved_model=joblib.dump(text_clf_comp,'saved_model.joblib')
I created a LSTM model that takes like 1 hour to train and saving it took like a couple of seconds and 2 mb.
Right now, is better to train my MultinominalNB classifier everytime than saving and loading it.
I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.
My configurations are as follows:
Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)
Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.
The predictions are all empty strings at the end of the process. Any ideas how to improve the model?
The Persian dataset in Common Voice has around 280 hours of validated audio, so this should be enough to create a model that has better accuracy than you're reporting.
What would help here is to know what the CER and WER figures are for the model? Being able to see these indicates whether the best course of action lies with the hyperparameters of the acoustic model or with the KenLM language model. The difference is explained here in the testing section of the DeepSpeech PlayBook.
It is also likely you would need to perform transfer learning on the Persian dataset. I am assuming that the Persian dataset is written in Alefbā-ye Fārsi. This means that you need to drop the alphabet layer in order to learn from the English checkpoints (which use Latin script).
More information on how to perform transfer learning is in the DeepSpeech documentation, but essentially, you need to do two things:
Use the --drop_source_layers 3 flag to drop the source layers, to allow for transfer learning from another alphabet
Use the --load_checkpoint_dir deepspeech-data/deepspeech-0.9.3-checkpoint flag to specify where to load checkpoints from on which to perform transfer learning.
maybe you need to decrease learning rate or use a learning rate scheduler.
Although it is stated in the slim model that train_image_classifier.py can be used to train models from scratch, I found it hard in practice. In my case, I am trying to train ResNet from scratch on a local machine with 6xK80s. I used this:
DATASET_DIR=/nv/hmart1/ashaban6/scratch/data/imagenet_RF_record
TRAIN_DIR=/nv/hmart1/ashaban6/scratch/train_dir
DEPTH=50
NUM_CLONES=8
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7,8" python train_image_classifier.py --train_dir=${TRAIN_DIR} --dataset_name=imagenet --model_name=resnet_v1_${DEPTH} --max_number_of_steps=100000000 --batch_size=32 --learning_rate=0.1 --learning_rate_decay_type=exponential --dataset_split_name=train --dataset_dir=${DATASET_DIR} --optimizer=momentum --momentum=0.9 --learning_rate_decay_factor=0.1 --num_epochs_per_decay=30 --weight_decay=0.0001 --num_readers=12 --num_clones=$NUM_CLONES
I followed the same settings as is suggested in the paper. I am using 8 GPUs on a local machine with batch_size 32 so the effective batch size is 32x8=256. Learning rate is initially set to 0.1 and will be decayed by 10 every 30 epochs. After 70K steps (70000x256/1.2e6 ~ 15 epochs), the top-1 performance on the validation set is as low as ~14% while it should be around 50% after that many iterations. I used this command to get the top-1 performance:
DATASET_DIR=/nv/hmart1/ashaban6/scratch/data/imagenet_RF_record
CHECKPOINT_FILE=/nv/hmart1/ashaban6/scratch/train_dir/
DEPTH=50
CUDA_VISIBLE_DEVICES="10" python eval_image_classifier.py --alsologtostderr --checkpoint_path=${CHECKPOINT_FILE} --dataset_dir=${DATASET_DIR} --dataset_name=imagenet --dataset_split_name=validation --model_name=resnet_v1_${DEPTH}
With the lack of working examples it is hard to say if there is a bug in the slim training code or a problem in my script. It anything wrong in my script? Has anyone successfully trained the resent from scratch?