I can't figure out for the life of me how to generate text from the default model feeding in a prefix:
I have downloaded the model and here is my code:
import gpt_2_simple as gpt2
model_name = "124M"
sess = gpt2.start_tf_sess()
gpt2.generate(sess, model_name=model_name)
gpt2.generate(sess, model_name=model_name, prefix="<|My name is |>")
However when i run it i get the following error:
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found. (0) Failed precondition: Attempting to use uninitialized value model/h3/mlp/c_proj/w [[{{node model/h3/mlp/c_proj/w/read}}]] [[strided_slice/_33]] (1) Failed precondition: Attempting to use uninitialized value model/h3/mlp/c_proj/w [[{{node model/h3/mlp/c_proj/w/read}}]]
Any idea what I'm doing wrong?
You are trying to generate without loading parameters first.
It seems that the downloaded models are used for training ("finetuning") but they are not loaded for generation.
For generation, the library tries to run a previously saved Tensorflow model ("checkpoints" in TF terminology).
Finetuning
You can generate a checkpoint by training the model for a few epochs using your own dataset (or working from the dataset published by the researches).
Otherwise, gpt-2-simple makes it easy. Get a text file with some text and train it:
gpt_2_simple --sample_every 50 finetune yourtext.txt
Let it run for a few epochs and have a look at the result samples. A checkpoint will be saved every 100 epochs. Once you are happy, hit CTRL+C and it will save a last checkpoint.
You can then generate text using:
gpt_2_simple generate --prefix "Once upon a time" --nsamples 5
The gpt_2_simple tool accepts a -h argument for help. Have a look at the other options. Using the library from code is similar to this tool workflow.
Generating without finetuning
The author explains in this GitHub question the procedure to skip finetuning entirely. Just copy the model to the checkpoint directory (you need to download the model first, have a look at that link):
mkdir -p checkpoint/
cp -r models/345M checkpoint/run1
Related
I'm training a custom named entity recognition model, I created the config.cfg and train.spacy files, among all it has, I'm using this as pre-trained vectors en_core_web_lg
[paths]
train = null
dev = null
vectors = "en_core_web_lg"
init_tok2vec = null
I then train the model using
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy
This works and I can see the output model.
Then I want to train another NER model that has nothing to do with the previous one (same code different data) and I get this error:
Error: [E884] The pipeline could not be initialized because the vectors could not be found at 'en_core_web_lg'.
If your pipeline was already initialized/trained before, call 'resume_training' instead of 'initialize', or initialize only the components that are new.
It looks like it modified the base en_core_web_lg model, which can be a problem for me since I use it for different models, some fine-tuned and others just out of the box.
How can I train this NER model making sure the downloaded en_core_web_lg model is not modified? and would this ensure that I can train several models without interfering with each other?
When you use a model as a source of vectors, or for that matter a source for any other part of a pipeline, spaCy will not modify it under any circumstances. Something else is going on.
Are you perhaps using a different virtualenv? Does spacy.load("en_core_web_lg") work?
One thing that could be happening (but seems less likely) is that in some fields, you can use the name of an installed pipeline (using entry points) or a local path. If you have a directory named en_core_web_lg where you are training that could be checked first.
I need to have a frozen graph (GrafDef file) while using Tensorflow 2.X.
That is because I use a tool which expects a frozen graph, however, my training needed to be done on TF2.X and Keras.
I tried many different ways to save my TF2 model. The variant with which I was able to get the most useful formats is the following:
sess = tf.compat.v1.Session()
saver = tf.compat.v1.train.Saver(var_list=cnn.trainable_variables)
save_path = saver.save(sess, os.path.join(CHKPT_DIR, CHKPT_FILE))
tf.compat.v1.train.write_graph(sess.graph_def, CHKPT_DIR, TRAIN_GRAPH, as_text=False)
That way I was able to get the following files:
float_model.ckpt.data-00000-of-00001
float_model.ckpt.index
checkpoint
training_model.pb
Of these files I need the *.ckpt and training_model.pb to freeze my model. However, when using the freeze_graph.sh (with TF1.X, different virtual environment), it throws the error
ValueError: No variables to save
This is although I give it the variables as a list via var_list=cnn.trainable_variables. cnn.trainable_variables also is not empty and seems to have all the used variables of my model.
Thus, I tried using the following method, according to TF2.X standards (assuming cnn is my model):
cnn.save(CHKPT_PATH)
checkpoint = tf.train.Checkpoint(cnn)
save_path = checkpoint.save(CHKPT_PATH)
Here I get the following files:
float_model.ckpt-1.data-00000-of-00001
float_model.ckpt-1.index
checkpoint
floating_model.ckpt/keras_metadata.pb
floating_model.ckpt/saved_model.pb
floating_model.ckpt/assets
floating_model.ckpt/variables
But here is where I get confused. Is there some kind of frozen graph available already? Or is there some kind of equivalent in here? And if not, how to get it with TF2.X if possible? I found the sentence
The .save() method is already saving a *.pb ready for inference.
in this post. So the frozen graph is ready for inference, and thus one of these files must be equivalent to a frozen graph, right?
Similar to this question:
Where can I find model.ckpt in faster_rcnn_resnet50_coco model? (this solution doesn't work for me)
I have downloaded the ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8 with the intention of using it as a starting point. I am using the sample model configuration associated with that model in the TF model zoo.
I am only changing the num classes and paths for tuning, training and eval.
With:
fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\model.ckpt"
I get:
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\model.ckpt
With:
fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\ckpt-0.*"
I get:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad mag
ic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm currently using absolute paths because it's easiest, but if it's a problem I can re-organize my project structure.
Checkpoint Folder
The official documentation from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
says to do something like
fine_tune_checkpoint: a path prefix to the pre-existing checkpoint (ie:"/usr/home/username/checkpoint/model.ckpt-#####").
Is there something I am doing wrong here? I am running this with the following command (also from documentation):
python object_detection/model_main_tf2.py \
--pipeline_config_path="C:\\Users\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\my_model.config" \
--model_dir="C:\\Users\\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\training" \
--alsologtostderr
Try changing the fine_tune_checkpoint path in the config file to something like path_to_folder/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0
And in your training command, set the model_dir flag to just point to the model directory, don't include training, kind of like --model_dir=<path_to>/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8
Source
Just change the backslashes to forward-slashes, since you're on windows
I have the following problem: When I retrain the TF object detection API with my own dataset, the training is often killed and I don't know the reason. there is no errors log, just killed.
Moreover, why in my MODEL_DIR only few model.ckpt-XXXX are saved?
Secondly, when I try to export the above model to a frozen graph with the provided script, I saw in the analysis that there is incomplete shape:
================== Model Analysis Report ======================
Incomplete shape.
I used a model.cpkt-XXXX after the training process got killed, is it the reason why the shape is incomplete?
The exported model can be use for inference but I guess it is not optimal...
FYI, I have retrained the mobileSSDv2 with 1 class and I have modified the pipeline config file regarding the changes as follow:
I change number of class to 1
in train config {} part, I changed batch size to 12 and put the number of steps to 200
train_input_reader and eval_input_reader {} parts, I have added my path the the TF record and labelmap.pbtxt
in eval_config {} part, I have changed the number of example to 85 ( the number of picture in my eval images repository) and max eval to 5.
I use ubuntu 16.04 with tensoflow-GPU 1.12.0 in a virtualenv with python 2.7.
Thank you in advance.
If you are using tensorflow-gpu and you have a GPU, 200 is a really low number, that you reach in less few minutes (and your conv-net will learn nothing). Increase it to 100.000, at least.
Moreover, due to the low number of training steps, you might expect that training save your model at start (step 0) and end training (step 200), so you get only 2 models.
Tensorflow save models every 600 seconds, if you don't change save_interval_secs inside trainer.py
I am writing a neural network in tensorflow and I want to be able to export my final trained network and import it in another program to play a game. I have found multiple forum posts like:
Tensorflow: How to use a trained model in a application?
Tensorflow: how to save/restore a model?
I also saw in the tf documentations they were using estimators to save the model but I am not sure if that is what I'm looking for and how to apply it.
But those talk about exporting the entire session and importing it into the application and using Session.run, but as I understand it that requires an input of the predicted output and will run another training step on my network. I don't want to continue training my network - it's finished - I now want to evaluate a specific state given to me by the game only.
Thanks in advance for any help available.
As I know, there are 2 way of doing it.
checkpoint files(metagraph)
savedmodel
savedmodel is very convenient, but study curve is higher than checkpoint file. you can check this tutorial
and import model is not continue run the training, it is basically restore all the variable you learned.