Gensim Model : class 'FileNotFoundError' - python

Well the issue is I have 1000s of the document and I passed all the documents for the training of Gensim model and I successfully trained and saved the model in .model format.
But with the current format, 2 new files have also been generated
doc2vec.model
doc2vec.model.trainables.syn1neg.npy
doc2vec.model.wv.vectors.npy
Due to the limitation of Hardware I trained and saved the model on Google Colab and Google Driver respectively. When I downloaded the generated models and extra files in my local machine and ran the code it's giving me a File Not Found Error, whereas I have added the particular files where the .py file is or current working directory is.
Well I used below code
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize
files = readfiles("CuratedData")
data = [TaggedDocument(words=word_tokenize(_d.decode('utf-8').strip().lower()), tags=[str(i)]) for i, _d in enumerate(files)]
max_epochs = 100
vec_size = 300
alpha = 0.025
model = Doc2Vec(vector_size=vec_size,
alpha=alpha,
min_alpha=0.00025,
min_count=1,
dm=1)
model.build_vocab(data)
for epoch in range(max_epochs):
print('iteration {0}'.format(epoch))
model.train(data,
total_examples=model.corpus_count,
epochs=model.iter)
# decrease the learning rate
model.alpha -= 0.0002
# fix the learning rate, no decay
model.min_alpha = model.alpha
model.save("doc2vec.model")
print("Model Saved")
Code for Loading the Model
webVec = ""
try:
path = os.path.join(os.getcwd(), "doc2vec.model")
model = Word2Vec.load(path)
data = word_tokenize(content['htmlResponse'].lower())
# Webvector
webVec = model.infer_vector(data)
except ValueError as ve:
print(ve)
except (TypeError, ZeroDivisionError) as ty:
print(ty)
except:
print("Oops!", sys.exc_info()[0], "occurred.")
Any help would be greatly appreciated. Thanks, Cheers

Saving a large model will usually create several subsidiary files for the large internal arrays. All those files must be kept together. (They will all start with the same string, the name you originally specified - in your case, doc2vec.model.)
It's possible there was another file you failed to download. But without seeing the code you used to trigger the error, or the full error traceback stack (with filenames and lines-of-code involved), it's hard to guess what exactly you did to trigger a FileNotFoundError. You may want to edit your question to add that info, so it's clearer what code you ran before, and what library code is involved in, the exact error.

Related

Error : Could not read image in Google Colab

I am following this tutorial to build a custom-made object detection model on Detect.
https://www.analyticsvidhya.com/blog/2021/06/simplest-way-to-do-object-detection-on-custom-datasets/
I have collected and labelled my images, put them on my Drive and I am running the following code snippet to train the model which is part of a Python Notebook on Google Colab:
Train_dataset = core.Dataset('/content/drive/My Drive/training model/Training',transform=custom_transforms)#L1
Test_dataset = core.Dataset('/content/drive/My Drive/training model/Test')#L2
loader=core.DataLoader(Train_dataset, batch_size=2, shuffle=True)#L3
model = core.Model(['black car', 'grey car','white truck'])#L4
losses = model.fit(loader, Test_dataset, epochs=25, lr_step_size=5, learning_rate=0.001, verbose=True)#L5
plt.plot(losses)
plt.show()
However, I keep getting the following error shortly after the first model epoch starts :
ValueError: Could not read image /content/drive/My Drive/training model/Training/frame22.jpg
It gives this error randomly, not only with frame22 but with other frames also that are not present in this directory. I tried to remount my Drive with enabling force_remount at the beginning of the script, but the error persists.
I checked the code of the core.Dataset implementation from Detecto and I confirm what I said in my comments.
The index is created by getting all the .xml annotation files and creating an index that maps them to their image. It does not check that the image is actually there.
For the image filename, it uses the one that is inside the xml file, not the name of the xml file. See below a view of an annotation XML file, where you see the filename attribute. If you change the name of your image, you need to change it inside the xml file.

Loading in your own Image data with tensorflow and tdfs.ImageFolder

I want to train a GAN and generate images of pokemon. I scraped around 10000 images from the internet which are locally saved. My folder is structured like so:
all_data:
- train:
-bulbasaur.png
-45.png
-....png
- test:
-bulbasaur.png
-45.png
-....png
- validation:
-bulbasaur.png
-45.png
-....png
I tried to load it via:
builder = tfds.ImageFolder(os.path.join(os.getcwd(), "all_data"))
print(builder.info) # num examples, labels... are automatically calculated
ds = builder.as_dataset(split='train', shuffle_files=True)
tfds.show_examples(ds, builder.info)
but I get the error of:
ValueError: Unrecognized split test. Subsplit API not yet supported for ImageFolder. Split name should be one of []. Is there a Problem with how I structured the dataset? As you can tell from the code snippet the different files all have completely varying names (either their English name or their Pokedex number) is that a problem? Since I do not want to classify anything I thought the labeling is not really important.
Also if it helps the splits from the output I get for the builder Info is empty.
tfds.core.DatasetInfo(
....
supervised_keys=('image', 'label'),
splits={
},...
)
Thanks a lot in advance!
Your folder structure should be like;
/content/image_dir/
train/
cat/
cat_1.png
cat_2.png
cat_3.png
dog/
dog_1.png
dog_2.png
dog_3.png
test/
cat.png
dog.png
Below code works with this structured directory
import tensorflow as tf
import tensorflow_datasets as tfds
builder = tfds.ImageFolder('/content/image_dir/')
print(builder.info) # num examples, labels... are automatically calculated
ds = builder.as_dataset(split='train', shuffle_files=True)
tfds.show_examples(ds, builder.info)

Tensorflow frozen inference graph from .meta .info .data and combining frozen inference graphs

I am new to tensorflow, and currently struggling with some issues :
How to get frozen inference graph from .meta .data .info without pipeline config
I wanted to check pre trained models of traffic sign detection in real time. Model contains 3 files - .meta .data .info, but i cant find information, how to convert them into frozen inference graph without pipeline config. Everything i find is either outdated or needs pipeline config.
Also, i tried to train model myself, but i think that problem is .ppa files (GTSDB dataset), because with .png or .jpg everything worked just fine.
How to combine two or more frozen inference graphs
I have successfully trained model on my own dataset (detect some specific object), but i want that model to work with some pre trained models like faster rcnn inception or ssd mobilenet. I understand that i have to load both models, but i have no idea how to make them work at the same time and is it even possible?
UPDATE
I'm halfway there on first problem - now i have frozen_model.pb, problem was in output node names, i got confused and didn't know what to put there, so after hours of "investigating", got working code:
import os, argparse
import tensorflow as tf
# The original freeze_graph function
# from tensorflow.python.tools.freeze_graph import freeze_graph
dir = os.path.dirname(os.path.realpath(__file__))
def freeze_graph(model_dir):
"""Extract the sub graph defined by the output nodes and convert
all its variables into constant
Args:
model_dir: the root folder containing the checkpoint state file
output_node_names: a string, containing all the output node's names,
comma separated
"""
if not tf.gfile.Exists(model_dir):
raise AssertionError(
"Export directory doesn't exists. Please specify an export "
"directory: %s" % model_dir)
# if not output_node_names:
# print("You need to supply the name of a node to --output_node_names.")
# return -1
# We retrieve our checkpoint fullpath
checkpoint = tf.train.get_checkpoint_state(model_dir)
input_checkpoint = checkpoint.model_checkpoint_path
# We precise the file fullname of our freezed graph
absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
output_graph = absolute_model_dir + "/frozen_model.pb"
# We clear devices to allow TensorFlow to control on which device it will load operations
clear_devices = True
# We start a session using a temporary fresh Graph
with tf.Session(graph=tf.Graph()) as sess:
# We import the meta graph in the current default Graph
saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)
# We restore the weights
saver.restore(sess, input_checkpoint)
# We use a built-in TF helper to export variables to constants
output_graph_def = tf.graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes
[n.name for n in tf.get_default_graph().as_graph_def().node] # The output node names are used to select the usefull nodes
)
# Finally we serialize and dump the output graph to the filesystem
with tf.gfile.GFile(output_graph, "wb") as f:
f.write(output_graph_def.SerializeToString())
print("%d ops in the final graph." % len(output_graph_def.node))
return output_graph_def
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", type=str, default="", help="Model folder to export")
# parser.add_argument("--output_node_names", type=str, default="", help="The name of the output nodes, comma separated.")
args = parser.parse_args()
freeze_graph(args.model_dir)
I had to change few lines - remove --output_node_names and change output_node_names in output_graph_def to [n.name for n in tf.get_default_graph().as_graph_def().node]
Now i got new problems - I can't convert .pb to .pbtxt, and error is :
ValueError: Input 0 of node prefix/Variable/Assign was passed float from prefix/Variable:0 incompatible with expected float_ref.
And once again, information on this problem is outdated - everything i found is at least year old. I'm starting to think that fix for frozen_graph is not correct, and that is the reason why i'm having new error.
I would really appreciate some advice on this matter.
if you write
[n.name for n in tf.get_default_graph().as_graph_def().node]
in your convert_variables_to_constants function, you define every node the graph has as an output node, which of course will not work. (This is probably the reason for your ValueError)
You need to find the name of the real output node, the best way for this is often to look at the trained model in tensorboard and analyze the graph there, or you print out every node of your graph. Often the last node that is printed out is your output node (ignore everything that has 'gradients' in the name or 'Adam' if you have used that as an optimizer)
An easy way to do this (insert it after you restore the session):
gd = sess.graph.as_graph_def()
for node in gd.node:
print(node.name)

Export Tensorflow Estimator

I'm trying to build a CNN with Tensorflow (r1.4) based on the API tf.estimator. It's a canned model. The idea is to train and evaluate the network with estimator in python and use the prediction in C++ without estimator by loading a pb file generated after the training.
My first question is, is it possible?
If yes, the training part works and the prediction part works too (with pb file generated without estimator) but it doesn't work when I load a pb file from estimator.
I got this error : "Data loss: Can't parse saved_model.pb as binary proto"
My pyhon code to export my model :
feature_spec = {'input_image': parsing_ops.FixedLenFeature(dtype=dtypes.float32, shape=[1, 48 * 48])}
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
input_fn = tf.estimator.inputs.numpy_input_fn(self.eval_features,
self.eval_label,
shuffle=False,
num_epochs=1)
eval_result = self.model.evaluate(input_fn=input_fn, name='eval')
exporter = tf.estimator.FinalExporter('save_model', export_input_fn)
exporter.export(estimator=self.model, export_path=MODEL_DIR,
checkpoint_path=self.model.latest_checkpoint(),
eval_result=eval_result,
is_the_final_export=True)
It doesn't work neither with tf.estimator.Estimator.export_savedmodel()
If one of you knows an explicit tutorial on estimator with canned model and how to export it, I'm interested
Please look at this issue on github, it looks like you have the same problem. Apparently (at least when using estimator.export_savedmodel) you should load the graph with LoadSavedModel instead of ReadBinaryProto, because it's not saved as a graphdef file.
You'll find here a bit more instructions about how to use it:
const string export_dir = ...
SavedModelBundle bundle;
...
LoadSavedModel(session_options, run_options, export_dir, {kSavedModelTagTrain},
&bundle);
I can't seem to find the SavedModelBundle documentation for c++ to use it afterwards, but it's likely close to the same class in Java, in which case it basically contains the session and the graph you'll be using.

Exporting and loading models

System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Mac os Sierra (10.12.5)
TensorFlow installed from:
Using pip
TensorFlow version (use command below):
1.2.1
The Problem:
I'm trying to save and restore a model trained from Python to Python.
I've the model saved in three .chkpt files (meta, index and data-000000-of-00001) and I'm trying to read it into my session, save the model using add_meta_graph_and_variables and then read it again using the loader: loader.load(session,[tf.saved_model.tag_constants.TRAINING], pathToSaveModel).
This is my code:
First, I restore the weights from the three files containing "data", "index" and "meta" (the metagraph and the weights") into my session using saver restore:
with tf.Session(graph=tf.Graph()) as session:
##HERE IS THE CODE OF MY NETWORK (Very long)
session.run(tf.global_variables_initializer())
#Load
saver = tf.train.Saver()
saver.restore(session, "newModel.chkpt")
features = loadFeatures(["cat2.jpg"])
res = predictions.eval(
feed_dict={
x: features,
keep_prob: 1.0, })
print('Image {} has a prob {} '.format(image, res))
b = saved_model_builder.SavedModelBuilder(pathToSaveModel)
b.add_meta_graph_and_variables(session, [tf.saved_model.tag_constants.TRAINING])
b.save()
With this code, I've a good classification and finally a new folder containing the model saved with add_meta_graph_and_variables:
Now, I want to use the saved model to classify, again, the same image. This time I used the loader instead the restore:
with tf.Session(graph=tf.Graph()) as session:
##HERE IS THE CODE OF MY NETWORK (Very long)
#session.run(tf.global_variables_initializer())
#Load
from tensorflow.python.saved_model import loader
loader.load(session, [tf.saved_model.tag_constants.TRAINING], pathToSaveModel)
features = loadFeatures(["cat2.jpg"])
res = predictions.eval(
feed_dict={
x: features,
keep_prob: 1.0, })
print('Image {} has a prob {} '.format(image, res))
And here comes the problem:
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value b_fcO
[[Node: b_fcO/read = Identity[T=DT_FLOAT, _class=["loc:#b_fcO"], _device="/job:localhost/replica:0/task:0/cpu:0"](b_fcO)]]
If I've tried to use: session.run(tf.global_variables_initializer()) then it works but the classification is not valid, I think that the weights are not being exported / imported from the very beginning and after test many things I'm stuck here.
Any clues about what I'm doing wrong?.
Thanks in advance.
Update:
This is how the model is in three files in the beginning:
Just a few things you should check are:
What is pathToSaveModel?
Where is the checkpoint file?
open the checkpoint file with a text editor: to what folder does it point?
is the path to the weights correct?
By going over these questions I was always able to find the mistake I made. Hope it helps!

Categories